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Preface 


Asynchronous  Transfer  Mode  (ATM)  networks  are  widely  considered  to  be  the  new 
generation  of  high  speed  communication  systems  both  for  broadband  public  information 
highways  and  for  local  and  wide  area  private  networks.  Over  recent  years  there  has  been  a 
great  deal  of  progress  in  research  and  development  of  ATM  technology,  but  there  are  still 
many  interesting  and  important  problems  to  be  resolved  such  as  traffic  characterisation  and 
control,  routing  and  optimisation,  ATM  switching  techniques  and  provision  of  specified 
quality  of  service. 

This  book  presents  twenty-three  research  papers,  both  from  industry  and  academia,  reflecting 
latest  original  contributions  in  the  theory  and  practice  of  performance  modelling  and  analysis 
of  ATM  networks  worldwide.  These  papers  were  selected,  subject  to  peer  review,  from  those 
submitted  as  expanded  and  revised  versions  out  of  eighty-nine  shorter  papers  presented  at  the 
Third  EFIP  Workshop  on  “Performance  Modelling  and  Evaluation  of  ATM  Networks”,  July  2- 
6,  1995,  Craiglands  Hotel,  Ilkley,  West  Yorkshire,  UK.  At  least  three  referees  drawn  from  the 
Scientific  Committee  and  externally  were  involved  in  the  evaluation  process  of  each  paper. 

The  research  papers  were  classified  into  seven  parts  covering  the  following  topics:  Traffic 
Models  and  Characterisation,  Traffic  and  Congestion  Control,  Routing  and  Optimisation, 
Adaptation  Layer  and  Protocols,  Network  Management,  Models  of  ATM  Switches, 
Bandwidth  and  Admission  Control  and  Performance  Modelling  Studies. 

Part  One  on  “Traffic  Models  and  Characterisation”  includes  three  papers  and  is  concerned 
with  modelling  and  performance  implications  of  multiplexed  streams  of  bursty  and  correlated 
traffic  in  ATM  networks.  New  analytic  traffic  models  are  proposed,  focusing,  respectively,  on 
the  characterisation  of  ATM  traffic  generated  by  Variable  Bit  Rate  (VBR)  video  applications 
and  the  determination  of  output  burst  length  of  an  ATM  switch  via  entropy  maximisation. 
Moreover,  a  validation  study  is  presented  relating  to  Markovian  models  replicating  real  ATM 
traffic  flows. 

Part  Two  on  “Traffic  and  Congestion  Control”  addresses  fundamental  objectives  such  as 
guaranteed  network  performance,  traffic  prediction  and  management  and  contracted  quality  of 
service.  This  part  brings  together  four  papers  describing  analytic  and  simulation  studies  on 
ATM  traffic  and  congestion  control  mechanisms.  The  works  are  based  on  flow  control  at 
connectionless  layer  combined  with  dynamic  bandwidth  allocation,  Finite  Impulse  Response 
(FIR)  neural  networks,  User  Parameter  Control  (UPC)  functions  and  the  strategy  of  traffic 
dispersion. 
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Part  Three  on  “Routing  and  Optimisation”  focuses  on  the  inherent  problems  of  many 
services  envisaged  for  ATM  networks  involving  information  transfer  from  one  to  one  or  one 
to  many  recipients  for  multimedia  applications.  It  includes  two  papers  which  devise 
appropriate  performance  metrics  and  carry  out  rigorous  comparisons  involving  pre-planned 
routing  techniques  for  virtual  path  restoration  as  well  as  control  schemes  on  virtual  path 
bandwidth  and  dynamic  routing  under  both  static  and  dynamic  traffic  conditions. 

Part  Four  on  “Adaptation  Layer  and  Protocols”  reports  a  single  study  discussing  detailed 
simulation  experiments  on  the  adaptability  and  performance  issues  of  the  Transport  Control 
Protocol  (TCP)  when  running  over  high  speed  ATM  networks.  Part  Five  on  “ATM 
Management”  presents  one  paper  concerning  with  the  provision  of  traffic  loss  guarantees  in 
economically  efficient  ATM  networks  by  means  of  an  iterative  pricing  algorithm 
incorporating,  as  a  dynamic  feedback  signal,  a  load  dependent  price  per  usage  unit  of  network 
resources. 

Part  Six  on  “Models  of  ATM  Switches”  consists  of  five  papers  which  describe  analytic 
methodologies  and  cost-effective  algorithms  for  the  performance  evaluation  of  various  ATM 
switch  architectures  such  as  Multistage  Interconnection  Networks  (MINs),  shared  output 
buffer  queues  and  3-stage  clos  switching  networks.  The  methodologies  are  based  on  discrete¬ 
time  Markovian  analysis,  diffusion  approximation  approach,  maximum  entropy  principle  and 
traffic  flow  formalism  for  non-blocking  operations.  Such  robust  and  reliable  tools  and 
techniques  are  of  great  value  towards  the  derivation  of  new  closed-form  expressions  and 
bounds  for  typical  performance  measures  such  as  queue  length  distributions,  cell-loss  (and 
blocking)  probabilities  and  end-to-end  delays. 

Part  Seven  on  “Bandwidth  and  Admission  Control”  is  concerned  with  novel  methodologies 
for  ATM  bandwidth  and  performance  optimisation,  call  connection  control  and  traffic 
shaping.  This  part  includes  three  papers  which  apply  numerical  simulations  and  also 
analytical  techniques  using  theoretical  arguments  and  an  iterative  Markov  chain  scheme. 

Finally,  Part  Eight  on  “Performance  Modelling  Studies”  includes  four  papers  dealing  with 
various  ATM  performance  modelling  and  evaluation  issues.  The  first  two  papers  apply, 
respectively,  analytical  methods  relating  to  a  composite  technique  for  an  ATM  clos  switching 
network  and  Markov  Chain  solutions  for  fast  reservation  protocols.  The  last  two  papers  deal 
with  the  important  topic  of  accelerated  simulation  techniques  for  ATM  networks. 

I  would  like  to  end  this  forward  by  expressing  my  thanks  to  IFIP  TC6  and  Working  Groups 
WG  6.3  and  WG  6.4  for  sponsoring  the  3rd  Workshop  on  the  Performance  Modelling  and 
Evaluation  of  ATM  Networks  and  to  British  Computer  Society  Performance  Engineering 
Specialist  Group,  Performance  Engineering  Section  of  BT  Labs.,  UK,  Telematics 
International  Ltd.,  UK,  Departments  of  Computing,  of  Electrical  Engineering  and  of 
Mathematics,  University  of  Bradford,  Engineering  and  Physical  Sciences  Research  Council 
(EPSRC),  UK,  tor  their  support.  My  thanks  are  also  extended  to  the  members  of  the 
Scientific  Committee  and  external  referees  for  their  invaluable  and  timely  reviews. 
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PART  ONE 


Traffic  Models  and  Characterisation 


1 

Validation  and  Tuning  of  an  MPEG-1 
Video  Model 


Marco  Conti,  Enrico  Gregori 

CNUCE,  Institute  of  National  Research  Council 

Via  S.  Maria  36  -  56126  Pisa  -  Italy,  Phone :  +39-50-593111 

Fax:  +39-50-589354,  E-mail:  {M. Conti,  E. Gregori}  @ cnuce.cnr.it 


Abstract 

Variable  Bit  Rate  (VBR)  video  traffic  is  expected  to  become  one  of  the  major  traffic  sources 
for  high-speed  networks.  Although  the  modeling  of  VBR  video  sources  has  recently 
received  significant  attention,  there  is  currently  no  widely  accepted  model  which  lends  itself 
to  mathematical  analysis.  This  paper  addresses  the  problem  of  characterizing  the  traffic  gen¬ 
erated  by  VBR  video  applications.  Specifically,  we  define  an  analytically  tractable  model 
for  the  traffic  generated  by  an  MPEG-1  encoder.  An  extensive  validation  of  this  model  is 
carried  out  by  analyzing  its  suitability  to  capture  the  statistical  behavior  of  a  wide  variety  of 
MPEG-1  sources  ranging  from  movies,  sports  events,  talk  shows,  etc.  We  show  that  our 
model,  with  an  adequate  tuning  of  its  parameters,  is  able  to  provide  an  accurate  representa¬ 
tion  of  these  different  kinds  of  MPEG- 1  sources. 


Keywords 

Variable  bit  rate  video,  MPEG,  ATM,  statistical  multiplexing,  Markov  chain, 


1  INTRODUCTION 

Recent  technological  advances  in  fiber  optics  and  switching  systems  have  provided  the  tech¬ 
nological  basis  for  the  development  of  high-capacity  Broadband-Integrated  Services  Digital 
Networks  ( B-ISDNs ),  which  are  capable  of  supporting  transmission  speeds  of  several  hun¬ 
dred  Mbps  [1].  This  enormous  potential  for  fast  and  massive  information  transport  should  be 


Work  carried  out  in  the  framework  of  CNR  coordinated  projects  “  Gestione  del  traffico  VBR  in 
ambiente  interconnesso".  and  “Sistemi  distribuiti  real-time  per  il  supporto  di  applicazioni  multimeiali  . 
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able  to  support  not  only  the  traditional  data  and  voice  services,  but  also  a  variety  of  new 
applications,  including  the  transport  of  images,  teleconferencing,  moving  video,  and  large 
volumes  of  interactive  computer  data.  Asynchronous  Transfer  Mode  ( ATM)  is  the  transfer 
technique  for  the  implementation  of  such  B-ISDNs,  due  to  its  efficiency  and  flexibility  [1], 

Although  significant  research  effort  has  focused  on  the  development  of  efficient  informa¬ 
tion  multiplexing  schemes  for  the  diversified  B-ISDN/ATM  environment,  most  of  the  prac¬ 
tical  problems  related  to  real-time  applications  remain  unsolved  and  not  addressed  by  the 
ATM  Forum  [1,2]. 

Variable  Bit  Rate  (VBR)  video  is  currently  by  far  the  most  interesting  and  challenging 
real-time  application.  A  VBR  encoder  attempts  to  keep  the  quality  of  video  output  constant 
and  at  the  same  time  reduces  bandwidth  requirements  since  only  a  minimum  amount  of 
information  has  to  be  transferred.  On  the  other  hand,  as  VBR  video  traffic  is  both  highly  var¬ 
iable  and  delay  sensitive,  high-speed  networks  (e.g.,  ATM)  are  generally  implemented  by 
assigning  peak  rate  bandwidths  to  VBR  video  applications,  and  by  using  the  residual  band¬ 
width  for  non-real-time  traffic.  This  approach  may  however  be  inefficient.  To  define  band¬ 
width  allocation  schemes  which  provide  an  adequate  QoS  for  VBR  applications  and 
minimize  the  wastage  of  bandwidth,  the  effects  of  the  video  applications  on  the  network 
must  be  investigated. 

VBR  video  generates  a  traffic  with  complex  characteristics  which  cannot  be  effectively 
described  in  terms  of  traditional  traffic  models.  Developing  accurate  and  analytically  tracta¬ 
ble  representation  schemes  for  real-time  traffic  will  provide  a  basis  for  the  development  of 
efficient  multiplexing  schemes  and  increased  utilization  of  networking  resources. 

While  the  modeling  of  VBR  video  sources  has  recently  received  significant  attention  [1, 
2,  4,  10,  11,  32,  13,  14],  there  is  currently  no  widely  accepted  model  which  lends  itself  to 
mathematical  analysis.  Furthermore,  new  video  compression  standards,  such  as  the  MPEG 
family  [7,  8,  15],  are  emerging.  In  this  paper  we  focus  on  the  MPEG-1  coding  algorithm. 
The  MPEG-1  standard  specifies  the  coding  algorithm  for  full  motion  video  information  with 
an  output  peak  rate  in  the  1.5-2  Mb/s  range.  Starting  from  the  analysis  of  the  trace  of  the 
movie  “Star  Wars”  encoded  with  MPEG-1  algorithm  we  propose  an  analytically  tractable 
model  [6].  In  this  paper  we  validate  this  model  by  analyzing  its  suitability  to  capture  the  sta¬ 
tistical  behavior  of  a  wide  variety  of  MPEG-1  sources  ranging  from  movies,  sports  events, 
talk  shows,  etc.  The  output  of  an  MPEG-1  coder,  depending  on  the  class  of  traffic,  exhibits 
a  very  different  statistical  behavior.  Movies  have,  in  general,  a  “heavy  tailed”  autocorrela¬ 
tion  function,  while,  in  TV  sports  and  in  “talk  shows”,  the  correlation  disappears  after  a  few 
seconds.  We  show  that  the  “heavy  tailed”  behavior  has  a  significant  impact  on  the  buffer- 
size  statistics  in  ATM  multiplexers. 

Our  model,  with  an  adequate  tuning  of  its  parameters,  is  able  to  provide  an  accurate  rep¬ 
resentation  of  these  different  kinds  of  MPEG-1  sources.  Specifically,  by  exploiting  experi¬ 
mental  results,  we  have  identified  a  fitting  procedure  which  provides  a  relationship  between 
the  behavior  of  a  real  source  (mainly  the  tail  of  its  autocorrelation  function)  and  the  model- 
parameter  values  which  provides  the  best  fitting  of  the  source  behavior. 

The  paper  is  organized  as  follows:  Section  2  presents  the  characteristics  of  MPEG-1 
sources  relevant  for  our  investigation.  An  MPEG-1  analytical  model  is  described  in  Section 
3  and  validated  in  Section  4.  The  tuning  of  the  model  is  investigated  in  Section  5. 
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2  MPEG-1  VIDEO  SOURCE 

An  uncompressed  video  source  may  generate  bits  at  rates  as  high  as  hundreds  of  Mbps 
(Mega-bits  per  second);  a  few  such  sources  could,  in  other  words,  occupy  the  entire  network 
capacity  that  is  available  today.  Data  compression  techniques  are  therefore  employed  to 
reduce  the  video  source  bit  rate  which  is  transmitted  over  the  network.  The  resulting  traffic 
is  highly  variable  and  dependent  on  the  encoding  scheme  adopted  and  on  the  activity  of  the 
movie. 

VBR  video  is  currently  considered  to  be  the  dominant  bandwidth-demanding  real-time 
application  for  high-speed  networks  in  the  immediate  future.  Developing  accurate  and  ana¬ 
lytically  tractable  models  for  this  kind  of  traffic  will  thus  provide  a  basis  for  the  design  and 
development  of  these  networks.  Before  this  can  be  done  we  need  to  fully  understand  the 
characteristics  of  the  video  source. 

2.1  MPEG-1  coding  scheme 

MPEG-1  is  a  specification  for  coding  moving  pictures,  developed  by  the  ISO  Joint  Motion 
Pictures  Experts  group.  The  standard  is  well  suited  for  a  large  range  of  video  applications  at 
a  variety  of  bit  rates.  A  combination  of  video  and  audio  information,  particularly  for 
“movie”  applications,  can  also  be  compressed.  Typical  compression  ratios  are  in  the  range  of 
50:1  to  200:1  [3].  The  algorithm  is  asymmetrical;  that  is,  it  requires  more  computational 
complexity  to  compress  video  than  to  decompress  it.  Applications  well  suited  for  this  are 
those  that  require  the  frequent  use  of  decompression,  but  for  which  compression  is  only 
done  once.  A  very  good  example  of  this  is  Video  On  Demand  (VOD). 

MPEG-1  is  an  interframe  coder.  Coders  in  this  class  exploit,  in  addition  to  intraframe 
coding,  the  temporal  redundancy  between  adjacent  frames  by  predicting  the  next  frame  from 
the  current  one.  A  key  feature  that  distinguishes  MPEG-1  from  previous  coding  algorithms 
is  bidirectional  temporal  prediction.  For  this  type  of  prediction,  some  of  the  frames  are 
encoded  using  two  reference  frames,  one  in  the  past  and  one  in  the  future.  This  results  in 
higher  compression  gains. 

As  indicated  above,  when  applying  MPEG-1  to  video,  one  of  three  different  coding 
modes  can  be  used  for  each  frame.  The  terminology  used  for  the  resulting  frame  is  related  to 
the  mode  used  and  is  as  follows: 

•  I-frame :  intra  frame  coded,  i.e.  coded  with  JPEG. 

•  P-frame:  predictive  coded  with  reference  to  a  past  picture. 

•  B-frame :  bidirectional  predictive  coded. 

I-frames  provide  access  points  for  random  access  but  only  with  moderate  compression.  Pre¬ 
dictive  coded  frames  are  also  generally  used  as  a  reference  for  future  B-frames.  Type  B 
frames  provide  the  highest  amount  of  compression  but  require  both  a  past  and  future  refer¬ 
ence  prediction.  In  addition,  B-frames  are  never  used  as  reference  frames. 

In  the  encoded  sequence,  as  shown  in  Figure  1,  the  frames  are  arranged  into  groups.  In 
this  case  a  group  consists  of  12  frames  -  one  I-frame,  three  P- frames  and  eight  B-frames. 
Figure  1  also  shows  the  relationship  between  the  frames.  We  can  see  that  1-frames  are  inde¬ 
pendent,  P-ffames  are  predicted,  and  B-frames  are  bidirectionally  predicted. 
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Group  of  frames 


Figure  1  A  sequence  of  MPEG- 1  frames  and  their  relationship. 

2.2  Statistics  of  MPEG-1  coded  movies 

Figure  2  shows  a  small  extraction  from  the  output  of  the  MPEG-1  coded  movie  Star  Wars 
released  by  M.  Garret  at  Bellcore.  Specifically,  frames  are  coded  into  groups  of  twelve 
frames  as  defined  in  Figure  1  (i.e.,  the  frame  pattern  is  IBBPBBPBBPBB). 

As  shown  in  Figure  2,  the  bandwidth  required  to  transmit  consecutive  frames  is  highly 
variable  and  very  much  depends  on  the  frame  types,  I,  P  and  B.  Furthermore,  as  expected 
(due  to  the  coding  scheme  algorithm)  the  shape  of  the  output  is  repeated  every  twelve 
frames. 


Figure  2  Part  of  the  MPEG-1  coder  trace,  revealing  group  length  and  frame  pattern. 

Statistical  analysis  indicates  that  the  output  of  an  MPEG-1  encoder  should  be  described 
by  three  partially  correlated1  submodels  where  each  submodel  describes  the  output  process 
corresponding  to  one  frame-type.  This  leads  to  a  model  with  a  very  large  state  space. 

The  model  space  complexity  is  reduced  by  avoiding  a  separate  representation  for  the  var- 
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Figure  3  The  aggregate  sequence. 

ious  frame-types.  Specifically,  this  is  obtained  by  considering  a  different  time  scale,  in 
which  the  time  unit  is  the  group  (i.e.,  a  sequence  IBBPBBPBBPBB)  and  the  bit  rate  per  time 
unit  is  the  sum  of  the  amount  of  bandwidth  generated  by  all  the  frames  in  a  group.  One 
group  is  in  this  case  equal  to  12  frames  and  each  frame  is  generated  every  l/24th  second. 
The  resulting  sequence  is  hereafter  named  aggregate  sequence. 


Figure  4  Low  frequency  component  of  the  aggregate  sequence. 

Figure  3  shows  a  plot  of  the  aggregate  sequence  generated  by  an  MPEG-1  coder  with  the 
Star  Wars  as  a  source.  A  time  unit,  on  the  x-axis,  is  equal  to  0.5  seconds,  i.e.,  a  group  inter¬ 
arrival. 

The  bit  rate  of  consecutive  frames  shows  that  the  bandwidth  changes  in  a  rapid  but  bursty 
way.  However  there  is  also  a  slowly  changing  underlying  structure.  This  low  frequency 
underlying  structure  of  the  sequence  can  be  better  highlighted  by  passing  the  aggregate 
sequence  through  a  moving  average  filter  of  length  W.  The  result  of  this  filtering  process 


1 .  A  precise  analysis  of  the  correlation  between  the  three  different  subsequences  is  presented  in  [6]. 
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with  a  window  size  W/=300  groups  (i.e.  two  and  a  half  minutes),  is  shown  in  Figure  4.  These 
characteristics  of  the  aggregate  sequence  can  be  highlighted  better  by  observing  its  autocor¬ 
relation  function. 


Figure  5  Short  range  dependencies 

The  autocorrelation  function  for  the  aggregate  sequence  is  plotted  in  Figures  5  and  6, 
showing  the  short  (0<n<30)  and  long  range  (n  >  30)  dependencies,  respectively.  Fig¬ 
ure  5  shows  the  existence  of  a  strong  short-range  dependence  for  time  lags  below  approxi¬ 
mately  30  groups,  which  corresponds  to  15  seconds.  In  this  range  the  autocorrelation 
function  drops  quickly  (the  autocorrelation  with  n  =  30  is  about  0.2).  However,  after  this 
sharp  initial  decrease,  as  shown  in  Figure  6,  it  takes  a  very  long  time  before  the  autocorrela¬ 
tion  function  drops  to  zero.  Specifically,  Figure  6  highlights  the  existence  of  a  significant 
long-range  dependence  which  lasts  for  time  lags  up  to  3500  groups  i.e.,  about  29  minutes. 
The  tail  of  the  autocorrelation  function  decreases  slowly.  Similar  behavior  of  the  autocorre¬ 
lation  function  have  been  observed  in  [15]  for  other  sequences. 


Figure  6  Long  range  dependencies 
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3  MPEG-1  MODELING 

Figures  5  and  6  show  that  in  the  aggregate  sequence  there  are  both  short-range  dependencies 
which  last  for  around  20-30  groups  (some  seconds),  and  long-range  dependencies  which  last 
for  thousands  of  groups  (some  minutes).  In  order  to  capture  both  types  of  dependencies  a 
bidimensional  Markov  chain  (  Lk,  Hk,  k  >  0 )  is  used,  in  which  {  LAk  >  0 )  is  used  to  repre¬ 
sent  the  long  term  correlation,  while  |  HAk  >  0)  represents  the  short  term  correlation.  Spe¬ 
cifically,  in  our  model  the  process  { HAk  >  0)  describes  the  bit  rate  per  group  of  an  MPEG- 
1  encoder.  To  avoid  unnecessary  complexity  (in  the  state  space  (  HAk  >  0} )  we  quantize 
the  bit  rate  information  into  a  number  of  levels.  The  number  of  quantization  levels  for  the 

process  will  hereafter  be  denoted  by  N  (i.e.,  Hk  £  { 0,  1,  2 . N-  1 )  ).  The 

question  of  which  quantization  method  should  be  used  is  not  discussed  here.  For  us  it 
seemed  natural  to  use  uniform  quantization.  For  this  reason,  let  max  and  min  denote  the 
maximum  and  minimum  bit  rates  observed  in  the  aggregate  sequence.  The  possible  bit  rates 
between  max  and  min  are  quantized  with  a  constant  step  size  A  =  (max  -  min ) /N ,  result¬ 
ing  in  the  actual  bit  rate  of  the  source  equal  to  j  ■  A  +  min  where  j  is  the  quantization  level 
holding  the  property  0  <  j  <  N  -  I . 

To  represent  the  low-frequency  component  of  an  MPEG  source,  a  modulating  process 
{Lk\k>0}  is  included  in  the  model  as  well  (Lk  €  { 0,  1,  2,  ....  M  -  1 )  )  .  The  transitions 
in  the  Markov  chain  occur  every  time  unit  (i.e.,  a  group  interarrival),  while  the  process 
1  Lk }  changes  its  state  on  average  only  after  70-100  time  units. 

The  process  we  want  to  model  now  takes  the  form  { Lk,  Hk,k>  0 ) ,  where 
Lk£  {0,  ....  M  -  1 }  is  the  status  of  the  low  frequency  process  corresponding  to  the  £th 
group,  and  Hk  £  { 0,  . . .,  N  -  1 )  is  the  corresponding  state  in  the  high  frequency  process. 

The  transition  probabilities  of  the  Markov  chain,  denoted  by  p{ .  ; 

Pij.im  =  =  h  Hk  =  m\Lk- 1  =  *>  Hk_  j  =  j) .  (3.1) 

are  estimated  from  an  MPEG-1  trace  through  the  procedure  reported  below.  To  explain  the 
procedure  better  we  use  [fQ,fvf2,  •  ••!  to  indicate  the  frame  sequence  in  the  original 
sequence. 

Procedure  for  the  computation  of  transition  probabilities  ptj  lm 

1.  Produce  the  aggregate  sequence  (high  frequency  sequence )  [aQ,ava^,  .. . )  ,  where 
at  denotes  the  bit  rate  of  the  /-th  group: 

n 

ai  =  Z  (3-2) 

;'  =  o 

The  number  1 2  refers  to  the  group  length  used  by  the  MPEG-1  coder  (see  Figure  1). 

2.  Produce  the  aggregate  filtered  sequence  (low  frequency  sequence )  {  a0,  5,,  ... )  , 

where  a(.  denotes  the  bit  rate  of  the  t-th  group  in  the  filtered  sequence.  is  obtained 
by  passing  the  aggregate  sequence  through  a  moving  average  filter  of  length  W: 

j  T^/21-1 

ai  =  w  Z  ai+  r  w/l~\+  j  ■ 

j  =  -LHV2J 


(3.3) 


10 


Part  One  Traffic  Models  and  Characterisation 


3.  Quantize  the  high  and  low  frequency  sequences  into  M  and  N  uniform  levels  respec¬ 
tively.  Number  the  levels  from  0  to  M  -  1  and  from  0  to  N  -  1  . 

4.  Using  low  and  high  frequency  sequences,  measure  the  1-step  transition  probability 

matrix  P  consisting  of 

Pij,lm  =  =  P(Lk  =  l'Hk  =  m\Lk-\  =  l’Hk-  1  =  ■/)» 

where  Lk  is  the  kth  element  in  the  quantized  low-frequency  sequence,  Hk  is  the  fcth 
element  in  the  quantized  high-frequency  sequence,  i,  l  £  {0,  M-  1 }  and 
j,m  £  (0,  ....  N-l  \  • 

The  models  presented  throughout  are  obtained  with  parameters  M= 8  and  N= 8.  In  [6]  we 
showed  that  these  parameter  values  represent  a  good  compromise  between  precision  and 
complexity. 

Specifically,  by  applying  the  fitting  procedure  to  the  Star  Wars  sequence,  our  Markov 
chain  has  the  transition  matrix  P  shown  in  (3.4).  Submatrices  \H  of  P  represent  the  proba¬ 
bilities  that  the  process  does  not  change  the  low  frequency  level  in  a  transition  i.e., 
P{Hk  =  j,Lk  =  i\Hk_x=  l,  Lk_l  =  i  1  -  Submatrices  A,  +  u  ( AMf ,.)  represent  the  prob¬ 
abilities  that  the  process  moves  to  the  next  (previous)  low  frequency  level  in  a  transition 
P{Hk  =  j,Lk  =  i\Hk_l  =  l,Lk_l  =  i+  1}  (P{Hk  =  j,Lk  =  i\Hk_l  =  l,Lk_l  =  i  -  1 1  )• 

p  *  k  J  ■  M 

^-00  A01 
A10  A11  A12 

A21  A22  A23 

A  A 

=  a32  a33 

A43 

0 

A76  A77 

4  MODEL  VALIDATION 

In  this  section  we  analyse  the  characteristics  of  the  model  to  see  whether  it  can  imitate  the 
behavior  of  “Star  Wars”.  Table  1  outlines  the  basic  statistics  of  the  MPEG-1  Star  Wars  trace, 
where,  “original”  and  “aggregate”  indicate  the  sequences  with  the  frame  and  the  group  as 
time  units  (see  Section  2.2),  respectively.  Figure  7  plots  the  real  source  autocorrelation  func¬ 
tion  ( r(n))  and  for  four  different  models  constructed  with  various  moving  average  window 
lengths,  W.  Time  lags  used  for  the  calculation  range  from  0  to  30  groups.  The  plot  thus  com- 


34 

A44  A45 
A54  A55  A56 

A65  A66  A67 


(3.4) 
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Table  1  Basic  statistics  for  the  MPEG-1  “Star  Wars”  movie. 


Measure 

Original  Sequence 

Aggregate  Sequence 

Mean  bandwidth,  p 

15598  bits/frame 

187185  bits/group 

Standard  deviation,  a 

18165  bits/frame 

72468  bits/group 

Coefficient  of  variation,  p/a 

1.16 

0.39 

Peak  bandwidth 

185267  bits/frame 

932710  bits/group 

Minimum  bandwidth 

476  bits/frame 

77754  bits/group 

Peak/mean  bandwidth 

11.88 

4.98 

pares  the  short-range  dependence  of  the  real  source,  and  different  parametrizations  of  the 
model.  The  model  constructed  with  W  =  20,  has  a  stronger  short-range  dependence  than 
the  real  source.  It  has,  however,  a  faster  decay.  Even  if  r(n )  of  the  model  is  still  above  the 


Time  lag  (Groups) 


Figure  7  Comparison  of  the  real  source’s  and  the  model’s  short-range  dependencies 
(m  =  8,  N  =  8). 


real  source  one  for  a  time  lag  equal  to  30  groups,  the  difference  is  smaller  than  for  n  =  10 . 
As  the  value  of  W  is  increased,  the  autocorrelation  function  of  the  model  tends  to  fall  off  in 
the  beginning  but  it  decreases  more  slowly.  The  model  with  W  =  40  is  a  good  example  to 
emphasize  this  behavior.  For  n  less  than  7,  the  short-range  dependence  of  the  model  takes 
on  values  lower  than  the  real  source.  For  time  lags  beyond  this  point,  the  plot  shows  that 
r(n)  of  the  model  decays  more  slowly  than  for  the  real  source.  The  autocorrelation  functions 
for  models  constructed  with  W  =  60  and  W  =  80  follow  the  same  pattern. 

We  know  that  the  long-range  dependence  of  MPEG-1  coded  VBR  video  is  very  strong. 
Figure  8  compares  the  autocorrelation  function  of  the  model  and  the  real  source  for  time 
lags  of  0  up  to  2000  groups.  Several  values  of  W  have  been  used.  A  model  constructed  with 
W  =  100  has  a  long-range  dependence  which  is  not  as  strong  as  that  of  the  real  source  for  n 
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Figure  8  Comparison  of  the  real  source’s  and  the  model’s  long-range  dependence. 

greater  than  100.  It  also  reaches  zero  at  a  time  lag  approximately  equal  to  1100,  which  is 
much  earlier  than  in  the  real  source.  r{n)  for  the  other  models  plotted  in  Figure  8  tells  us  that 
the  long-range  dependence  of  the  model  tends  to  get  stronger,  and  thus  approaches  the  real 
source,  as  W  is  increased.  For  example,  a  model  constructed  with  W  =  300  matches  the 
long-range  dependence  of  the  real  source  better  than  if  it  is  created  with  W  =  200 .  At  the 
same  time  we  know,  from  the  previous  subsection,  that  a  higher  value  of  W  implies  a  weaker 
short-range  dependence. 

4.1  Importance  of  the  Long-Term  Dependencies 

In  the  previous  section  we  have  shown  that  our  model,  depending  on  the  setting  of  the  W 
parameter,  is  able  to  precisely  capture  either  short-  or  long-range  dependencies.  In  this  sec¬ 
tion  we  investigate  the  importance  of  both  type  of  dependencies  in  the  study  of  the  statistical 
multiplexing.  Specifically,  the  aim  of  this  analysis  is  to  study  the  smoothing  in  the  traffic 
profile  obtained  by  the  superposition  of  several  VBR  video  sources  depending  on  the  type  of 
correlations  existing  in  traffic. 

Mutliplexing  of  VBR  video  sources  is  complex,  as  these  applications  have  low  tolerance 
towards  network  congestion.  Although  sufficient  buffer  capacity  may  be  available,  exces¬ 
sive  buffering  may  not  be  possible,  due  to  the  resulting  unacceptable  delays.  In  this  section 
we  therefore  investigate  the  queueing  time  distribution  experienced  by  VBR  video  traffic  as 
a  function  of  the  bandwidth  reserved  for  each  source.  As  shown  before  (see  Table  1),  the 
peak  rate  for  our  MPEG  source  corresponds  to  a  bandwidth  level  equal  to  c=7,  while  the 
average  is  about  bandwidth  level  equal  to  0.53.  Below  we  investigate  the  delay  experienced 
by  VBR  video  traffic  by  assuming  that  the  bandwidth  allocated  for  each  source  is  about 
twice  the  average,  i.e.,  c=l.  The  results  reported  in  Figure  9  were  obtained  by  studying  via 
simulation  the  queueing  delay  distribution  in  a  single  server  queueing  system  with  a  deter¬ 
ministic  service  time,  FIFO,  and  input  traffic  generated  by  5  independent  and  identically  dis- 
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Figure  9  Tail  of  delay  distribution,  c=  1 .0 

tributed  MPEG-1  sources.  Note  that  we  use  our  analytical  model  to  study  the  statistical 
multiplexing  problem  as  from  a  single  real  trace  is  impossible  to  obtain  reliable  statistical 
estimates.  Furthermore,  identically  distributed  sources  can  not  be  obtained  by  using  different 
traces.2  On  the  other  hand  our  model  can  provide  the  number  of  i.i.d  traces  required  to 
obtained  statistics  with  the  desired  precision. 

Figure  9  shows  that  a  53%  network  utilization  and  acceptable  delays  can  be  achieved  if  at 
least  eight  sources  are  multiplexed.  In  addition,  the  figure  clearly  indicates  that  the  tail  esti¬ 
mated  with  W=  20  is  extremely  underestimated  in  the  region  (lE-04,lE-02).  These  results 
show  that  the  long  term  correlation  affects  significantly  the  tail  of  the  delay  distribution  also 
for  lightly  loaded  network  (e.g.  network  utilization  in  the  order  of  50%).  Partially  neglecting 
it  (IF=20)  induces  optimistic  estimates  with  errors  in  the  order  of  100%!  These  observations 
have  been  confirmed  by  the  extensive  analysis  presented  in  [6]. 

4.2  Model  Validation  by  exploiting  other  MPEG-1  sources 

In  this  section  we  extend  the  validation  process  by  analyzing  the  suitability  of  the  model  for 
capturing  the  statistical  behavior  of  other  MPEG-1  sources.  Specifically,  we  consider  a  wide 
variety  of  sources  ranging  from  movies,  sports  events,  talk  shows,  etc.  The  traces  related  to 
these  sources,  encoded  with  MPEG-1  algorithm  with  the  parameters  reported  in  Table  2, 
were  released  by  O.  Rose  [10]. 

We  measured  the  autocorrelation  function  of  several  real  traces,  and  we  realized  that  it  is 
highly  variable  and  depends  on  the  kind  of  video  sequence.  We  have  identified  two  extreme 
cases:  movies  and  sports  events.  The  differences  between  the  two  classes  are  highlighted  by 
Figures  10  and  11.  Specifically,  the  sports  events  have  an  autocorrelation  function  that 
drops  to  zero  in  a  few  seconds  (see  Figure  10),  and  then  oscillates  around  zero. 


2.  The  distribution  of  the  number  of  bits  per  groups  highly  depends  on  the  movie. 
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Table  2  :  Encoder  parameters 


Encoder  input 
Colour  format 
Quantization  values 
Pattern 
GOP  size 

Motion  vector  search 
Reference  frame 
Slices 

Vector  /  range 


384  x  288  pel 

YUV  (4:1:1,  resolution  of  8  bits) 
1=10,  P=14,  B=18 
IBBPBBPBBPBB 
12 

‘Logarithmic’  /  ‘Simple’ 
‘Original’ 

1 

Half  pel  /  10 


i  i  i  i  i  i  i  i  i  i 

0  20  40  60  80  100  120  140  160  180  200 

Time  lag  (Groups) 


Figure  10  Autocorrelation  function  for  sports  events. 

On  the  other  hand,  movies  have  a  heavy  tailed  autocorrelation  function.  As  shown  in  Fig¬ 
ure  11,  in  “Terminator  II”,  the  correlation  between  frames  disappears  after  about  40  seconds 
(80  groups)  while  up  to  4  minutes  (500  groups)  are  necessary  to  lose  the  correlation  in 
“Jurassic  Park”.  Note  that  the  precision  on  the  estimates  of  the  autocorrelation  is  affected  by 
the  relatively  small  size  of  the  samples  (30  minutes).3  This  limited  amount  of  data  is  respon¬ 
sible  for  the  oscillating  behavior  of  the  tail  of  the  autocorrelation  function. 

We  now  investigate  the  model’s  flexibility  and  effectiveness  in  capturing  the  behavior  of 
the  various  real  sources.  For  this  reason  we  plot,  in  a  graph  for  each  trace,  the  autocorrela- 


3.  It  is  worth  recalling  that  the  autocorrelation  function  of  the  “Star  Wars"  movie  drops  to  zero  very  slowly,  in 
about  30  minutes.  Note  that  the  trace  of  this  movie  was  related  to  two  hours  of  video  (the  whole  movie)  and 
thus  the  estimated  tail  has  very  small  fluctuations,  whereas  the  new  available  traces  last  approximately  half  an 
hour  and  fluctuations  in  the  order  of  0.1  make  it  almost  impossible  to  analyze  the  tail  of  the  autocorrelation 
below  this  value. 
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Figure  11  Autocorrelation  function  for  Movies. 

tion  function  of  a  real  source  together  with  the  autocorrelation  obtained  from  the  model 
tuned  with  different  window  size  values,  W.  As  shown  before,  short-  or  long-range  depend¬ 
encies  can  be  emphasized  by  varying  the  parameter  W.  The  analysis  on  statistical  multiplex¬ 
ing  presented  in  the  previous  section  showed  that  the  long-term  correlation  significantly 
affects  the  tail  of  the  delay  distribution.  Partially  neglecting  it  leads  to  optimistic  estimates 
with  errors  in  the  order  of  100%.  For  this  reason,  below  we  primarily  focus  on  capturing  the 
tail  of  the  long-term  correlation. 


Figure  12  Comparison  of  the  real  source’s  and  the  model’s  autocorrelation  function  for  a 
“Formula- 1  car  race”. 
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Dependencies  between  frames  in  sports  events  last  for  a  few  seconds  and  are  well  cap¬ 
tured  by  a  filtering  with  a  very  small  W.  Figure  12  clearly  indicates  that  the  W=  2  seems  to  be 
the  best  solution  for  the  “Formula- 1  car  race”  trace. 

Figure  11  shows  example  of  movies  (i.e.,  “The  Silence  of  the  Lambs”  and  “James  Bond: 
Goldfinger”)  for  which  the  tail  of  the  autocorrelation  functions,  quickly  go  down  to  zero 
(i.e.,  the  slope  of  the  tail  is  high).  In  these  cases  the  low  frequency  components  are  captured 
very  well  by  the  model  using  small  window  sizes.  For  example,  as  shown  in  Figure  13, 


Figure  13  Comparison  of  the  real  source's  and  the  model’s  autocorrelation  function  for 
the  “The  Silence  of  the  Lambs”. 

JF=15  provides  the  best  fitting  for  “The  Silence  of  the  Lambs”.  A  similar  behavior  can  also 
be  observed  for  the  “James  Bond:  Goldfinger”  trace  for  which  we  identified  W=  60  as  the 
best  window  size. 

“Terminator  II”  and  “Jurassic  Park”,  on  the  contrary,  have  a  slower  decrease  in  the  tail  of 
the  autocorrelation  functions.4  We  consider  that  “Terminator  II”  has  a  slow  autocorrelation- 
function  decay  because,  although  it  goes  down  to  zero  in  about  40  seconds,  it  begins  the  sec¬ 
ond  part  of  the  decay  starting  from  a  value  of  autocorrelation  equal  to  0.15,  so  that  the  slope 
of  the  tail  of  the  autocorrelation  function  (that  we  are  interested  in)  is  very  low. 

As  expected,  big  window  sizes  are  needed  to  capture  the  low-frequency  component  of 
“Terminator  II”  and  “Jurassic  Park”.  Specifically,  as  shown  in  Figure  14,  the  model  with 
1F=340  fits  well  the  tail  of  the  autocorrelation  function  of  the  “Jurassic  Park”  trace,  except  in 
the  middle  where  it  has  a  slightly  lower  value  of  autocorrelation.  Similarly,  1^=300  is  identi¬ 
fied  as  the  best  window  size  for  “Terminator  II”. 


4.  In  these  cases,  however,  the  fluctuations  in  the  tail  of  the  autocorrelation  (computed  by  a  "limited”  amount 
of  real  data)  makes  it  difficult  to  find  the  best  window  sizes. 
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Time  lag  (Groups) 


Figure  14  Comparison  of  the  Real  source’s  and  the  model’s  autocorrelation  function  for 
the  movie  “Jurassic  Park”. 


5  MODEL  TUNING:  CHOICE  OF  THE  W  PARAMETER  VALUE 

The  analysis  presented  in  the  previous  section  shows  that 

1 .  Depending  on  the  type  of  source  (e.g.,  sports  events,  movies)  the  autocorrelation  func¬ 
tion  of  the  sequence  completely  changes:  the  low-frequency  component  is  always  sig¬ 
nificant  in  movies  while  it  is  almost  negligible  in  the  sports  events. 

2.  Sources  of  the  same  type  may  have  significant  differences.  For  example,  in  the  movies 
considered  in  this  work,  frames  become  almost  independent  after  40  seconds  for  “The 
Silence  of  the  Lambs”,  while  positive  correlations  still  exist  after  10-20  minutes  in 
“Star  Wars”. 

We  can  thus  conclude  that  neither  a  general  model  exists  for  MPEG-1  sources  nor  a  single 
model  can  be  defined  to  characterize  (at  least)  one  type  of  MPEG-1  sources  (e.g.,  movies). 
The  target  for  MPEG-1  modeling  is  therefore  to  define  a  set  of  rules  to  identify,  for  each 
MPEG-1  source,  the  best  choice  of  model -parameter  values  to  capture  (as  much  as  possible) 
the  behavior  of  the  source.  We  have  identified  this  set  of  rules  in  the  steps  1.-  4  on  page  10 
of  the  fitting  procedure  presented  in  Section  3.  However,  the  model  parameter  W  very  much 
depends  on  the  source.  Identifying  the  relationship  between  a  real  source  and  the  best  win¬ 
dow  size  W  to  capture  the  tail  of  its  autocorrelation  function  is  still  an  open  issue.  In  the 
analysis  presented  in  this  work  the  relationship  between  the  real  source  and  the  best  W  value 
seems  to  depend  on  the  slope  of  the  autocorrelation  function.  Roughly  speaking,  the  auto¬ 
correlation  function  presents  two  behaviors:  a  fast  decrease  in  the  first  frames  (e.g.,  20-30 
frames)  and  a  slower  decrease  in  its  tail.  A  first-order  estimate  of  this  behavior  can  be 
obtained  by  fitting  each  of  these  portions  with  a  straight  line  and  then  approximating  the 
speed  in  the  decrease  of  the  autocorrelation  function  with  the  slope,  m,  of  its  fitting  (straight) 
line. 

As  we  are  mainly  interested  in  capturing  long-term  correlations,  below  we  apply  this 
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approach  to  estimate  for  different  movies  the  decay  speed  in  the  tail  of  their  autocorrelation 
functions.  Figures  15,  16  and  17  show  the  fitting  line  and  the  tail  of  the  autocorrelation 


Figure  15  Slope  of  the  “The  Silence  of  the  Lambs”  autocorrelation. 


Time  lag  (x) 

Figure  16  Slope  of  the  “Jurassic  Park”  autocorrelation. 
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Figure  17  Slope  of  the  “Star  Wars”  autocorrelation. 

function  for  three  different  movies  (“The  Silence  of  the  Lambs”,  “Jurassic  Park”,  “Star 
Wars”)  which  exhibit  short-,  medium-  and  long-term  correlations,  respectively.  In  addition 
each  graph  reports  the  equation  of  the  fitting  line. 

If  we  considers  the  various  type  of  movies,  we  note  that  as  the  slope  increases  the  best 
window  size  decreases.  For  example,  for  “The  Silence  of  the  Lambs”  m  is  in  the  order  of 
-2.5  *  10E-03  and  W  =  15,  for  “Jurassic  Park”  m  ~  -2.4  x  10E-04  and  W  =  340,  and 
finally,  for  “Star  Wars”  m  ~  -7.4  x  10E-05  and  W  =  400 .  These  results  indicate  that  as  the 
slope  increases  the  best  window  size  value  decreases,  but  we  still  need  to  provide  a  mathe¬ 
matical  formulation  for  this  relationship. 

By  plotting  the  pair  of  values  ( m,  W)  for  the  different  movies,  and  by  fitting  these  points 
with  a  hyperbolic  function  (see  Figure  18)  we  have  identified  a  heuristic  rule: 
m  x  W  ~  constant .  Hence,  we  can  use  the  function  shown  in  Figure  1 8  to  identify  the  best 
window  size  for  a  given  source. 


6  SUMMARY  AND  CONCLUSIONS 

Modeling  VBR  video  is  a  difficult  task  due  to  the  complex  statistical  characteristics  of  this 
type  of  traffic.  In  this  paper  we  have  considered  the  modeling  of  an  MPEG-1  source. 

We  have  presented  a  Markov  model  which  with  an  adequate  tuning  of  its  parameters 
(mainly  the  window  size),  is  able  to  provide  an  accurate  representation  of  different  kinds  of 
MPEG-1  sources.  Specifically,  the  sources  considered  in  this  work  can  be,  at  least,  subdi¬ 
vided  into  at  least  two  groups:  movies  and  sports  events.  Dependencies  between  the  frames 
of  movies  disappear  after  minutes,  while  dependencies  in  sports  events  only  last  for  seconds. 
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Slope  (x) 

Figure  18  Relationship  between  the  slope  of  the  autocorrelation  function,  m,  and  the 
window  size  W 


Sharp  differences  also  exists  among  movies  (see  Figure  4). 

We  have  shown  that,  at  least  for  statistical  multiplexing  studies,  the  tail  of  the  autocorre¬ 
lation  function  (i.e.,  long-term  correlations)  cannot  be  neglected.  Thus  it  is  impossible  to 
produce  a  unique  characterization  of  MPEG-1  sources.  In  fact,  depending  on  the  type  of 
source  (e.g.,  sports  events,  movies)  the  autocorrelation  function  of  the  sequence  completely 
changes.  In  addition,  sources  of  the  same  type  may  have  significant  differences. 

We  have  presented  and  validated  an  approach  to  produce  a  precise  model  of  a  given 
MPEG-1  source.  The  main  problem  in  applying  our  model  is  the  selection  of  the  window 
size;  the  various  behaviours  of  the  sources  make  it  impossible  to  find  a  single  window  size 
for  all  the  cases.  Thus  we  have  identified  a  heuristic  rule  to  overcome  the  window-size  selec¬ 
tion  problem.  Our  heuristic  is  based  on  the  observation  that  the  product  of  the  best  window- 
size  value  and  the  decay  speed  of  the  autocorrelation  function  is  almost  independent  of  the 
movie. 
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Abstract 

Maximum  Entropy  Principle  is  used  in  deriving  an  approximate  expression  for  the  burst  length  of  a  tagged 
call  at  the  output  of  an  ATM  switch.  The  statistical  multiplexer  is  approximated  as  a  variable  server,  infinite 
buffer  queuing  system  with  only  cells  from  the  tagged  call  as  clients  where  each  incoming  cell  sees  the  server  in 
randomly  variable  vacations.  Numerical  experiments  are  carried  out  and  compared  with  the  simulation  results. 
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1  INTRODUCTION 

Asynchronous  Transfer  Mode  (ATM)  is  expected  to  be  the  carrier  mode  for  Broadband  Integrated  Services  Data 
Networks  (B-ISDN).  In  B-ISDN,  different  calls  will  have  different  call  characteristics,  like  peak  rate,  average  rate, 
etc.  Also  different  calls  will  have  different  QOS  requirements,  like  packet  loss,  packet  delay,  etc.  The  optical  fibre 
communication,  perceived  to  be  a  suitable  media  for  B-ISDN  applications,  provides  Bit  Error  Rate  (BER)  as  low 
as  10-9  -  10-1°.  Hence  ATM  which  provides  cell-based  connection  oriented  network  service,  is  an  ideal  trans¬ 
port  for  B-ISDN  services  on  low  error  fibre  optic  media.  Connection-oriented  network  service  is  preferred  over 
connection-less  network  service  because  the  former  demands  less  processing  overhead  at  intermediate  switches 
than  the  latter. 

Due  to  the  ’’bursty”  nature  of  B-ISDN  applications,  statistical  multiplexing  of  calls  is  preferred  for  its  effective 
utilization  of  bandwidth  and  buffer  resources.  Statistical  multiplexing,  however  causes  degradation  of  QOS 
parameters  like  average  and  standard  deviation  of  cell  delay  and  cell  loss  due  to  congestion  at  intermediate  ATM 
switches.  Reactive  controls,  like  end-to-end  flow  control,  are  commonly  used  in  low  speed  networks  like  X.25.  In 
the  ATM  environment,  reactive  congestion  controls  may  not  be  effective  because  of  the  large  Bandwidth-distance 
product.  Preventive  congestion  controls  like  Call  Admission  Controls  (CAC)  and  User  Parameter  Control  (UPC) 
are  proposed  for  avoiding  congestion  in  ATM  networks.  With  Call  Admission  Controls  in  place,  each  intermediate 
switch  in  the  pre-determined  path  of  the  call,  is  required  to  determine  whether  the  incomming  call  can  be  served 
with  the  demanded  QOS  parameters  without  effecting  the  QOS  of  existing  calls.  If  the  call  can  be  accepted,  the 
switch  forwards  the  ’’call  request”  to  the  next  switch;  otherwise  the  switch  sends  ’’call  reject”  back  to  the  source, 
in  which  case  the  source  may  hunt  for  another  route  for  the  call. 

Most  of  the  literature  in  performance  modeling  and  evaluation  of  ATM  networks  deal  with  a  single  link  or 
an  isolated  switching  node.  The  end-to-end  performance  analyses  of  large-scale  Broadband  Integrated  networks 
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is  essential  not  only  for  implementation  of  Call  Admission  Control  procedures  but  also  for  understanding  the 
efficiency  and  financial  viability  of  the  network  as  a  whole.  As  in  any  interconnected  network,  the  output  of  the 
upstream  node  will  be  the  input  for  the  next  node  and  hence  knowledge  of  the  output  characterstics  of  ATM 
switch  is  essential.  The  exact  characterization  of  the  output  process  of  an  ATM  switch  is  complex  and  intractable 
due  to  statistical  multiplexing  of  various  classes  of  multimedia  traffic.  Moreover,  some  intermediate  nodes  may 
be  fed  by  output  streams  of  more  than  one  upstream  node. 

Most  approaches  for  characterizing  the  output  processes,  proposed  in  the  literature  are  approximations  and 
have  only  limited  applicablity  in  call  admission  controls  and  end-to-end  performance  analyses.  Y.  Ohba,  et  al 
[ohba  91]  consider  an  ATM  switch  in  the  presence  of  three  kinds  of  traffic,  Gl-stream,  Batch  arrivals  and  a  set 
of  1PP  sources.  A  transient  expression  for  the  queue  length  distribution  at  the  arrival  instants  of  cells  from  the 
Gl-stream,  is  developed.  Using  that  queue  length  distribution,  the  waiting  time  distribution  and  inter-departure 
time  distributions  of  cells  from  a  Gl-stream  are  obtained.  Even  though,  in  principle,  the  same  transient  expression 
can  be  used  iteratively  for  obtaining  steady  state  queue  length  distribution,  it  may  not  be  practical  for  larger 
systems.  I.  Stavrakakis  [stav  91]  developed  models  for  bursty  traffic  when  they  undergo  splitting  and  merging. 
Specifically,  three  different  models  were  proposed  and  compared  for  bursty  traffic  when  it  is  splitted  and  cells 
routed  into  the  tagged  direction  with  probability  p,  and  diverted  away  from  the  tagged  direction  with  probability 
(1-p).  The  merging  of  bursty  traffic  is  characterised  as  another  bursty  process  in  terms  of  the  probabilities  of 
the  queue  being  empty  and  the  queue  being  not  empty.  This  also  analyzes  the  output  processes  at  intermediate 
switches  in  a  system  of  inter-connected  switches,  with  the  following  assumptions  -  the  input  at  any  switch  is 
only  a  fraction,  p  of  the  output  from  a  previous  switch,  i.e.  the  cell  will  be  sent  to  the  targeted  direction  with 
probability  p.  In  case  of  bursty  traffic,  the  above  assumption  may  not  be  valid. 

In  certain  B-ISDN  applications,  jitter  is  one  of  the  QOS  parameters.  Specifically  for  real-time  applicaions  like 
audio,  jitter  is  required  to  be  low,  so  that  proper  replay  of  audio  is  possible  at  the  destination.  W.Matragi,  et  al 
[mat  94-1,  mat  94-11]  modeled  the  jitter  of  a  call  at  the  output  as  the  difference  in  queue  lengths  at  the  departure 
instants  of  consecutive  cells.  They  considered  the  jitter  process  for  a  Gl-stream  of  customers  in  the  presence  of 
a  batch  arrival  process.  In  [mat  94-1],  the  Z-transform  of  the  jitter  of  a  Gl-stream  at  the  output  of  a  single  node 
is  obtained.  This  is  extended  in  [mat  94-11],  for  the  estimation  of  end-to-end  jitter  incurred  by  a  periodic  traffic 
in  an  ATM  network.  In  [rob  92,  boy  92],  the  influence  of  jitter  on  peak  rate  enforcement  and  user  parameter 
control  algorithms  is  studied.  Due  to  intermittent  clumping  of  cells,  user  parameter  control  algorithms  need  to 
be  more  complex.  I.  Cidon,  et  al  [cid  94],  obtained  analytical  expressions  for  messages,  maximum  cell  delay  in  a 
message  and  the  number  of  cells  in  a  message  whose  delay  exceeded  pre-specified  time  thresholds.  The  analytical 
expressions  obtained  here  can  be  solved  recursively. 

In  [wan  93],  J.L.  Wang,  et  al  considered  a  two  queue  priority  system,  where  real  time  traffic  is  given  high 
priority  over  non-real  time  traffic.  The  probability  distributions  for  inter-departure  times  of  cells  from  each 
queue  are  obtained. 

In  order  to  overcome  the  difficulties  in  output  characterization  of  ATM  switch,  almost  all  the  call  admission 
control  procedures  and  performance  analyses  reported  in  the  literature,  assume  ’’Node  Decomposition”.  To  use 
this  to  determine  whether  to  accept  a  call,  intermediate  switches  in  the  path  use  the  call  characteristics  as  they 
appear  at  source;  this  in  effect  assumes  that  the  characteristics  will  not  be  effected  by  the  upstream  switches. 
However  there  has  been  little  research,  (except  [lau  93])  in  validating  this  assumption.  In  [lau  93],  the  authors 
attempted  the  problem  of  validation  of  ’’Nodal  decomposition”  approach  through  extensive  simulations.  Both 
homogeneous  as  well  as  heterogeneous  ON-OFF  sources  are  considered  to  study  the  input-to-output  distortion 
in  individual  traffic  source  as  a  function  of  peak  rate  of  each  source  and  overall  load  factor.  The  authors  also 
studied  the  cross-correlations  amongs  the  output  sources.  This  paper  summarily  reports  two  conditions  under 
which  nodal  decomposition  can  be  applied  in  network-wide  performance  modeling.  These  are  -  1.  If  the  peak 
access  rate  of  each  source  does  not  exceed  5  %  of  the  total  link  capacity,  source  distortion  will  be  negligible.  2. 
Should  no  more  than  10  %  of  the  departing  sources  go  to  the  same  immediate  downstream  link,  inter  source 
cross-correlation  will  have  negligible  effects  on  the  queuing  performance  of  the  downstream  nodes. 

From  the  congestion  control  and  call  management  points  of  view,  burstiness  is  one  of  the  important  properties 
of  traffic  whose  knowledge  enables  us  in  designing  call  admission  control  procedures  with  better  utilization  of 
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buffer  and  bandwidth  resources.  Friesen  and  Wong  [frie  93]  considered  interconnection  of  user  nodes  which  are 
fed  by  multiple  traffic  sources  and  switch  nodes.  In  the  presence  of  bursty  traffic,  they  analyzed  mean  queue 
lengths,  mean  delays  at  every  user  node  and  switch  node.  It  is  observed  that  mean  queue  length  and  mean 
delay  are  larger  at  the  user  node  than  at  the  first  switch  node.  Similarly,  the  average  queue  length  and  mean 
delay  are  larger  at  the  first  switch  node  than  at  the  second  switch.  Smoothening  or  Burst  reduction  of  bursty 
sources  is  claimed  as  the  reason  for  this.  The  smoothening  effect  increases  for  higher  load.  S.Low  and  P.Varaiya 
[low  91,  low  93]  defined  burstiness  of  traffic  in  terms  of  the  buffer  required  at  the  server  for  the  given  service 
rate.  Using  a  deterministic  fluid  flow  model,  they  show  that  both  fixed  rate  and  leaky  bucket  servers  are  burst 

reducing.  .  . 

We  consider  a  queue  with  N  ON-OFF  sources  as  input,  served  by  a  slotted  channel.  Given  the  characteristics  of 
each  call  at  the  input  side,  we  obtain  expressions  for  the  density  function  of  its  burst  length  at  the  output  side. 
The  inter-departure  time  between  cells  of  a  call  within  a  burst,  and  hence  the  length  of  a  burst  at  the  output 
side,  depends  not  only  on  the  instantaneous  queue  length,  but  also  on  the  instantaneous  states  of  all  the  calls 
at  the  input  side.  This  is  modeled  using  the  Maximum  Entropy  principle.  The  queue  length  distribution  can 
be  obtained  by  approximating  the  multiplexed  traffic  at  the  input  to  the  queue  as  a  2-  state  Markov  Modulated 
Poisson  Process  (MMPP)  [hef  86]. 

The  problem  attempted  in  this  paper  is  different  from  the  earlier  literature  [mat  94-1,  mat  94-11,  ohba  91, 
stav  91,  wan  93].  W.  Matragi,  et  al  (mat  94-1,  mat  94-11]  and  Ohba,  et  al  [ohba  91]  considered  only  GI-  stream 
in  the  presence  of  batch  traffic.  I.  Stavrakakis  [stav  91]  and  J.L.  Wang,  et  al  [wan  93]  considered  only  combined 
output  characteristics.  The  output  characterization  of  individual  ON-OFF  sources  is  considered  important  for 
obvious  applications  in  telephone,  data  networks,  etc.  Also  to  the  best  knowledge  of  the  authors,  usage  of  the 
Maximum  Entropy  principle  for  estimation  of  the  service  time  density  function  is  new. 

In  this  paper,  we  analyze  the  distribution  of  burst  length  of  the  tagged  ON-OFF  source  at  the  output  of  a 
multiplexer  with  infinite  buffer.  The  input  to  the  multiplexer  is  a  set  of  heterogeneous  or  homogeneous  ON-OFF 
sources.  Section  2  presents  the  model  as  an  infinite  buffered  queue  fed  by  arrivals  from  a  number  of  ON-OFF 
sources.  The  effect  of  other  sources  on  the  output  characteristics  of  the  tagged  call  is  twofold.  The  inter-cell 
departure  time  of  two  successive  cells  within  a  burst  of  the  tagged  call  depends  on  the  number  of  sources  that  are 
in  ON  state  at  that  instant.  Section  3  introduces  the  notion  of  instananeous  bandwidth  available  to  the  tagged 
call  which  models  the  number  of  sources  that  are  in  ON  state  at  that  instant.  We  also  present  in  this  section 
the  usage  of  Maximum  Entropy  principle  to  estimate  the  density  function  of  the  instantaneous  bandwidth.  The 
inter-cell  departure  time  of  successive  cells  within  a  burst  of  the  tagged  call  also  depends  on  the  queue  length 
distribution  which  in  turn  depends  on  the  state  of  other  sources.  In  Section  4,  a  modified  queue  model  with 
variable  server  is  presented.  The  input  to  this  queue  is  cells  from  the  tagged  call.  The  variable  service  time  of 
the  server  is  to  model  the  instantaneous  bandwidth  available  to  the  tagged  call.  Also  the  server  is  assumed  to 
go  on  vacation  at  the  beginning  of  ON  state  which  will  model  the  dependence  in  the  queue  length  distribution. 
In  Section  5,  density  function  of  the  output  burst  length  is  analysed.  Some  numerical  examples  are  presented  in 
Section  6,  and  compared  with  simulation  results.  Section  7  gives  the  concluding  remarks. 


2  MODEL  DESCRIPTION 

We  consider  an  ATM  statistical  multiplexer  with  an  infinite  buffer  serving  N  ON-OFF  sources  each  generating 
cells  of  constant  size.  The  multiplexer  is  served  by  a  single  channel  with  capacity  C  bits/sec.  The  channel  is 
slotted  with  slot  size  equal  to  the  service  service  time  of  a  cell.  This  multiplexer  buffer  can  be  modeled  as  a 
discrete-time  single  server  system. 

Each  ON-FF  source  alternates  between  ON  and  OFF  states.  During  the  ON  state,  source  i  (i  =  1, ..,  N)  generates 
traffic  at  a  constant  rate  Ri  bits/sec.  Without  loss  of  generality,  we  consider  the  size  of  a  cell  to  be  53  bytes 
(ATM  standard).  Each  source  is  modeled  as  a  discrete  source  such  that  it  can  be  described  completely  at  time 
instants  t0,ti,  ...,  Tj-i,Tj,Tj+i, ...,  where  a;  =  rn-i  -  rn  =  53  x  8 /Ri  sec.,  for  all  n.  At  an  arbitrary  instant  rn, 
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if  the  source  i  is  in  ON  state,  the  source  will  continue  to  be  in  the  ON  state  at  time  instant  rn+  ]  with  probability 
Qi  and  with  probability  (1  —  a*),  the  source  will  switch  to  OFF  state  at  rn+ j.  Similarly  if  the  source  is  in  OFF 
state  at  the  instant  Tn,  it  will  continue  to  be  in  OFF  state  at  the  instant  rn+i  with  probability  Pi  and  switch  to 
ON  state  with  probability  (1  —  Pi).  The  source  will  emit  a  cell  of  size  53  bytes  at  the  time  instant  r„,  if  it  is  in 
the  ON  state  at  that  instant.  Let  6qN  be  the  average  number  of  cells  emitted  by  source  i  during  an  ON  period 
and  #oFF  be  the  average  length  of  OFF  period  in  units  of  cell  times.  Then  we  get 
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1 
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So  the  average  traffic  load  of  source  i,  R'av  is  given  by, 
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Consider  now,  the  intercell-departure  time  for  cells  belonging  to  the  same  ON  period  of  source  i.  This  intercell- 
departure  time  depends  not  only  on  the  number  of  cells  belonging  to  other  sources,  served  in  between  two  cells  of 
source  i  but  also  on  the  queue  length  at  the  departure  time  of  the  first  cell  of  the  tagged  cell  pair.  The  number  of 
cells  belonging  to  other  sources,  that  are  served  in  between  the  tagged  cell  pair,  is  a  random  variable  and  depends 
on  the  number  of  other  sources  that  are  in  the  ON  state  at  that  instant.  Using  this,  the  statistical  multiplexer 
can  be  approximated  as  an  infinite  buffered  queue  (with  only  the  tagged  ON-OFF  source  i  as  the  input),  which 
is  being  served  by  a  server  with  a  random  service  rate  u;  the  server  is  also  assumed  to  go  on  vacation  before 
starting  service  to  a  cell.  Thus  the  server  with  a  variable  service  rate  takes  into  account  the  fact  that  the  effective 
instantaneous  bandwidth  available  to  the  cells  of  the  tagged  source  is  variable  and  depends  on  the  number  of 
ON-OFF  sources  that  are  in  ON  state  at  that  instant.  The  vacation  period  of  the  server  is  also  a  random  variable 
and  takes  into  account  the  fact  that  before  commencement  of  service  to  a  cell,  the  cells  that  are  waiting  in  the 
multiplexer,  need  to  be  served. 


3  INSTANTANEOUS  BANDWIDTH 

In  the  statistical  multiplexer,  we  consider  the  service  of  cells  belonging  to  the  same  ON  state  of  source  i.  Specif¬ 
ically,  between  two  cells  of  the  same  ON  state  of  source  i,  depending  on  the  states  of  other  sources,  cells  belong 
to  other  sources  will  also  get  served.  If  the  number  of  cells  belonging  to  other  sources  present  in  between  two 
cells  belonging  to  the  source  i  is  large  enough  so  that  the  service  time  for  all  those  cells  is  more  than  a.i,  then 
the  instantaneous  inter-departure  time  between  the  cells  of  source  i  is  more  than  a*  and  is  equal  to  the  total 
service  time  of  the  cells  that  are  queued  in  between  those  two  cells.  If  this  number  is  small  enough  so  that  the 
total  service  time  for  all  those  cells  is  smaller  than  a*,  then  there  are  two  possible  cases: 

•  If  the  next  cell  of  source  i  has  arrived  before  the  departure  of  the  previous  cell,  then  the  interdeparture  time 
between  the  cells  of  source  i  will  be  equal  to  the  service  time  of  cells  belonging  to  the  other  sources. 

•  Otherwise,  the  inter-departure  time  between  the  cells  of  source  i  will  be  equal  to  a.i. 

The  number  of  cells  of  other  sources  arriving  between  the  cells  of  source  i  depends  on  which  sources  are  ON 
at  that  instant  and  their  peak  rates.  Consider  a  specific  situation  when  all  N  sources  are  in  the  ON  state.  Here 
in  between  two  cells  belonging  to  source  i,  the  average  number  of  cells  of  other  sources,  that  are  queued  up,  can 
be  calculated  as  follows: 
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The  average  number  of  cells  belonging  to  other  sources 
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The  service  time  required  to  serve  these  cells 
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So,  the  average  inter-departure  time  between  the  cells  belonging  to  source  i 
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where  u,  =  —  is  the  instantaneous  channel  bandwidth  of  source  i. 

i  ** 

Consider  another  situation  where  only  source  i  is  ON.  Then  no  other  cells  will  be  queued  in  between  two  cells 
of  source  i.  In  this  case,  the  instantaneous  channel  bandwidth  of  source  i,  will  be  Ui  =  C. 

The  instantaneous  bandwidth  u,  available  to  source  i  can  be  defined  as  the  state  of  the  system  with  respect 
to  source  i,  and  is  a  discrete  random  variable  which  can  take  upto  (2A,_1)  values.  Analysis  involving  a  discrete 
random  variable  with  such  a  large  state  space  may  not  be  practical.  When  the  rate  ,  for  all  j,  is  large  enough, 
the  instantaneous  bandwidth,  Uj  can  be  approximated  to  be  a  continuous  random  variable.  If  the  distribution 
of  u,,  is  known,  it  means  that  the  effect  of  all  other  sources  on  source  i  has  been  characterized. 

Approximating  a  discrete  random  variable  as  a  continuous  random  variable  involves  obtaining  density  function 
of  a  continuous  random  variable  with  point  probabilities  as  constraints.  In  principle,  this  can  be  formulated  as 
a  Maximum  Entropy  problem  with  the  density  function  as  the  optimizing  variable  and  the  point  probabilities 
of  the  discrete  random  variable  as  the  constraints.  Due  to  the  large  size  of  the  constraint  set,  this  problem  is 
complex;  we  simplify  this  by  considering  only  a  fixed  and  small  set  of  constraints. 

3.1  Maximum  Entropy  Principle 

Consider  the  instantaneous  bandwidth  available  to  source  i  as  the  state  of  the  system.  Dropping  the  subscript 
i,  we  denote  this  as  u.  Assume  that  we  know  only  its  minimum,  maximum  and  the  average.  Given  this,  the 
Maximum  Entropy  principle  [shor  80,  jay  57,  wil  70,  fer  70,  kou  94],  can  be  used  to  estimate  the  density  function 
of  u.  For  the  last  four  decades,  Maximum  Entropy  principle  is  being  used  in  various  engineering  fields  like 
Operation  Research,  Transportation,  Queueing  theory,  etc  for  estimation  of  the  state  probability  distribution  in 
the  absence  of  complete  information  about  the  state  of  the  system.  Of  late  Maximum  Entropy  principle  found 
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applications  in  the  area  of  ATM  networks  as  well.  Kouvatsos,  et  al  [kou  94]  has  used  Maximum  Entropy  principle 
for  estimating  the  queue  length  distribution  of  the  statistical  multiplexer.  In  this  paper,  we  use  Maximum  Entropy 
principle  to  estimate  density  function  of  the  instantaneous  bandwidth  available  when  we  know  only  its  average. 
The  Maximum  Entropy  principle  will  ’’choose”  the  density  function  such  that  the  entropy  is  maximized  with  the 
given  information  as  constraints.  In  other  words,  if  p(u)  is  the  density  function  of  u,  we  find  p(u)  by  maximizing 

rU-2 

Entropy,  H(u)  =  —  /  p(u)  In p(u)du  (1) 

J  U\ 

such  that, 


p(u)du  =  1 


(2) 


(3) 


where, 

Ui  =  Minimum  value  of  that  u  can  attain 
ri2  =  Maximum  value  of  that  it  can  attain 
Here  U2  =  C ,  channel  capacity. 
u  =  Average  of  u 

Clearly,  p(u),  that  can  be  obtained  from  above,  may  not  be  true  density  function  of  u.  Also  the  available 
information  about  u  may  not  be  sufficient  to  obtain  the  actual  density  function  of  u.  Maximum  Entropy 
principle  will  estimate  the  density  function  which  satisfies  the  given  information,  but  mostly  non-commital  about 
whatever  not  known.  Also  if  we  re-estimate  the  density  function  with  additional  information,  the  so-obtained 
density  function  may  be  different  from  that  obtained  previously.  Hence  the  density  function  obtained  from  this 
Maximum  Entropy  principle  will  be  an  approximation  to  the  true  density  of  the  system  state  u. 

The  solution,  for  the  above  set  of  equations  is  discussed  in  Appendix  and  is  given  by, 

p(u)  =  ex>-V2U 

where  Ai,  A2  can  be  obtained  from, 
eXl_1  (e>'2U2  -e*2"1)  =  A2 

U2eMu2  _  UieMtn  j 

eMu2  _  eA2ui  \2 

It  is  argued  in  the  Appendix  that  Eq.  (5)  has  unique  solution  for  A2.  It  can  also  be  observed  that  for  moderate 
to  high  load  conditions  (0.4  <  p  <  0.99),  where  the  average  instantaneous  bandwidth  u  <  u' ^U2 ,  the  solution  A2 
is  -ve.  Since,  as  reported  in  the  literature,  at  low  load  conditions,  the  input-output  distortion  is  negligible,  we 
consider  here  only  the  case,  A2  <  0. 

Now  rewriting  Eq.  (  5),  we  get 


(4) 

(5) 


«2  -  Ul 


U2  -  U  - 


_1_ 

A2 


=  l-e 


(6) 
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Since  A2  is  -ve, 

gMuj-ui)  <  1  (7) 

Assuming  that  the  term,  eA2*“2_U1*  in  Eq.  (  6)  is  negligibly  small,  the  solution  for  Eq.  (  6)  is  given  by  - 

A2  «  — (8) 

Ul  —  u 

It  is  observed  that  the  above  assumption  is  valid  with  varying  accuracies  in  many  examples  we  considered. 
The  ratio  of  channel  capacity  and  peak  rate  of  the  call  is  one  of  the  factors  which  effect  the  validity  of  the 
assumption.  Although,  exact  condition  for  the  validity  is  yet  to  be  derived,  an  empirical  condition  can  be  arrived 
at  by  conditioning  that  C2^U2~U''1  is  negligible. 

For  e*2(U2-Ul)  to  be  negligibly  small, 


|A2(u2  -  ui)|  >  10 


But  in  this  case,  A2  is  given  by  Eq.  (  8).  Hence  the  empirical  condition  for  the  validity  of  the  above  assumption 
is  given  by  - 


u2  -  Ui 
Ui  —u 


>  10 


(9) 


In  case  the  above  condition  is  not  satisfied,  Eq  (  5)  can  be  solved  numerically  for  A2.  The  following  successive 
approximation  algorithm  is  used  to  evaluate  A2  iteratively  with  initial  guess  is  given  by  Eq.  (  8).  The  main 
advantage  of  this  algorithm  is  its  insensitity  to  the  initial  guess.  The  neccessary  condition  for  this  algorithm  to 
converge  is  given  by  - 


eA2(u2— ui)  < 


which  can  be  satisfied  for  all  moderate  to  heavy  load  condistions. 
So 


Ui 


C.Rj 
£f=i  Ri 


Similarly  maximum,  u2  of  the  state  of  the  system  is  the  maximum  possible  share  of  the  channel  bandwidth 
for  source  i  as  it  occurs  when  the  instantaneous  load  is  minimum  possible,  i.e.  all  the  sources,  except  the  tagged 
source  i,  are  in  OFF  state. 

Let  v  =  (ii ,  x2,  ■  ■  ■ ,  Xi-i ,  1 ,  Xi+i , . . . ,  xN)  be  the  combined  state  of  all  sources  given  that  the  source  i  is  in  ON 
state,  where 


{0  if  the  source  j  is  in  OFF  state 
1  Otherwise 
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Also  let  «i( v)  and  p(v )  be  instantaneous  bandwidth  available  for  source  i  when  the  sources  are  in  state  u 
and  joint  probability  that  the  sources  are  in  state  v,  respectively.  Then  the  expected  value  of  instantaneous 
bandwidth  available,  u  for  source  i  when  it  is  ON  state  can  be  obtained  as  - 

U  =  ^Ui(l/).p(l/) 


Since  all  the  sources  are  independent  of  each  other,  we  can  write, 


N 

p(v)  =  n 

j=i  j 


where  p{xj)  is  the  probability  that  source  j  is  in  ON  state,  if  Xj  =  1  or  in  OFF  state  if  x,  =  0 
Also  it  can  be  easily  shown  that 


p(xj  =  1) 


(1  ~  0j) 

(1  -oy)  +  (l-ft) 


and 


p(Xj  =  0) 


(!  ~  «j) 

(1  -  a,-)  +  (1  -  0j) 


4  INTER-CELL  DEPARTURE  TIME 

We  now  consider  the  infinite  buffer  queue  served  by  a  single  server  with  capacity,  u  where  u  is  a  random  variable 
with  density  function  p(u).  The  customers  to  this  queue  are  the  cells  belonging  to  source  i.  We  also  assume 
that  the  server  will  be  on  vacation  at  the  time  of  arrival  of  each  cell.  The  vacation  period  is  a  random  variable, 
v  (>  0)  .  Let  b  =  be  the  service  time  of  a  cell  in  this  queueing  system,  with  ft(b)  and  B'(s)  as  the  density 
function  and  Laplace  Transform  of  b  respectively. 

The  vacation  period  seen  by  an  arriving  cell  of  soure  i  will  be  the  time  required  to  serve  the  cells  that  are 
waiting  in  queue  at  the  arrival  instant  of  this  cell.  The  inter-cell  departure  time  at  the  output  of  this  queue  is 
equivalent  to  the  inter-departure  time  of  cells  belonging  to  source  i,  from  this  multiplexer. 

We  consider  two  cases.  When  the  vacation  period  for  the  cell  is  so  large  that  before  the  start  of  service  of 
this  cell,  next  cell  of  the  same  ON  period  (or  burst)  has  arrived  into  the  queue,  then  the  instantaneous  inter-cell 
departure  time  between  the  present  cell  and  the  next,  is  equal  to  the  service  time  of  the  cell  in  the  above  queueing 
system.  Let  us  define  di  as  inter-departure  time  given  that  new  cell  has  arrived  before  the  service  of  previous 
cell  started.  Then  di  =  b  and  fdl  and  DJ(s)  are  the  density  function  and  Laplace  Transform  for  dr,  respectively. 

In  the  other  situation,  the  vacation  is  small  enough  so  that  the  service  of  the  cell  starts  before  the  arrival  of 
the  next  cell  of  the  same  burst.  Let  us  define  d2  as  the  inter-departure  time  between  the  cells  in  this  case.  Then 
we  get  d2  =  max(b,  a),  where  a  is  the  interarrival  time  of  cells  of  source  i  within  a  burst  (subscript  i  removed 
for  simplification). 

Since  d2  is  random  variable,  let  us  define  fd2  (d)  and  D\  (s)  as  the  density  function  and  Laplace  Transform  of 
d2,  respectively.  This  yields 


SdM  =  h(b)Fa{d )  +  F„(d)/a(d) 
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where,  Fb(.)  is  the  distribution  function  of  b  and  /„(.)  and  Fa(.)  are  the  density  and  distribution  functions  of  a, 
respectively. 

Since  a  is  constant, 


}a{d)  =  5(d  -  a) 

where  (5  is  dirac  delta  function. 


Fa(d) 


1  if  d  >  a 
0  otherwise 


So  rewriting, 

Sil  =  fb{d)Fa(d)  +  Fb(d)S(d  -  a) 
Then 


d;(s)  = 


4.1  Vacation  Period  And  Queue  Length  Distribution 

In  the  previous  section,  we  considered  the  server  with  vacations,  where  the  vacation  period  is  equivalent  to  the 
service  time  required  to  serve  all  the  cells  ahead  of  tagged  cell  of  the  tagged  source  in  the  multiplexer  buffer. 
The  vacation  period  at  any  arbitrary  time  instant  is  the  time  required  to  serve  a  cell  at  the  channel  rate  C  times 
the  queue  length  at  that  instant.  Therefore,  the  queue  length  distribution  of  the  multiplexer  will  be  needed  to 
find  the  vacation  period  distribution. 

The  statistical  multiplexing  of  N  ON-OFF  sources  can  be  approximated  to  be  a  2-state  MMPP  as  proposed  in 
Heffes,  et  al  [hef  86],  The  queue  length  distribution  of  the  MMPP  \  D  |  1  infinite  buffer  queue  may  be  obtained 
as  proposed  by  Ramaswami  [ram  80,  ram  88]  and  Lucantoni  [luc  91].  We  define  q  as  the  queue  length  of  the 
multiplexer  at  the  cell  departure  instants  and  q{n)  is  the  steady  state  queue  length  distribution. 

Consider  the  probability  that  at  the  start  of  the  service  time  of  a  cell  of  source  i  in  the  multiplexer,  the  next 
cell  of  the  same  ON  state  is  also  waiting.  Consider  the  instant  when  the  service  of  a  cell  belonging  to  source  i 
has  started.  Let  the  queue  length  at  that  instant  be  given  by  q' ,  with  distribution,  q'{n)  =  q{n  -  1),  for  n  =  1, 
2,  ... 

Let  S  be  the  set  of  all  possible  combined  states  of  all  sources  and  Rv  be  the  total  arrival  rate  of  cells  into  the 
multiplexer  when  the  combined  state  is  u,  i.e 

N 

Rv  =  Xj Rj ,  Xjts 

j=l 

Between  two  cell  arrivals  of  source  i,  the  number  of  cells  of  other  sources  that  can  arrive  is  given  by  nu  =  aRu. 


POO 

/  fdMe~dsdd 

Jo 

rOO 

/  h{b)e-b’db  +  Fb{a)e- 
Jo 
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If  the  queue  length  q'  is  greater  than  nw  when  service  to  a  cell  of  source  i  starts  then  another  cell  of  the  same 
ON  state  is  also  waiting.  Let  denote  the  probability  of  this  event  given  that  the  combined  state  of  all  sources 
is  v. 


OO 

Vv=  ]T  q'(n) 
n=n„+l 

Since  nu  is  real  number,  it  can  be  written  as  - 
n„  =  n/  +  rif  =  n/(l  -  n/)  +  (n/  +  l)n/ 

where  nj  and  n/  are  integral  and  fractional  part  of  n„,  respectively. 
Then 


pv  =  (1  -  71/)  x  q'(m  +  1)  +  ^2 

n=rn+2 

Let  p  denote  the  probability  averaged  over  state  v  that  at  the  start  of  service  of  a  cell  belonging  to  source  i, 
the  next  cell  of  the  same  ON  state  has  also  arrived.  Then 

P  =  ^PuPiy) 

veS 

Now  let  us  define  d  as  the  inter-departure  time  of  cells  of  the  same  ON  state  of  source  i.  Then  d  is  given  by, 
d  =  dip  +  (1  -  p)di 

Let  fd(d)  and  D*(s)  be  the  density  function  and  Laplace  Transform  of  d,  respectively  where  T>*(s)  is  given  by 
D*(s)  =  D*1(sp)D;(s(i-p)) 


5  BURST  LENGTH  AT  THE  OUTPUT 

We  define  the  burst  length  of  source  i  at  the  output  of  the  multiplexer  as  the  time  difference  between  the  start 
of  service  of  first  cell  of  the  burst  at  the  input  to  the  departure  of  the  last  cell  of  that  burst.  Let  brn  denote  the 
burst  length  at  the  output  when  there  are  n  cells  in  the  corresponding  burst  at  the  input  side.  Assuming  that 
within  an  ON  state  of  tagged  source,  variations  in  both  the  instantaneous  channel  bandwidth  as  well  as  vacation 
periods  are  negligible,  brn  can  be  approximated  as 


brn  =  (n  -  l)d 

if  Br*  (s)  is  the  Laplace  transform  of  brn 


Br'n(s)  =  D{((n  -  l)ps)D;((l  -  p)(n  -  l)s) 


(10) 
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But  the  probability  that  there  are  n  cells  in  the  burst  at  the  input  is  an  1(1  -  a),  for  n  =  1,2,.. 
denote  burst  length  at  the  output  averaged  over  n  and  Br*(s)  is  the  Laplace  transform  of  br\  where 


OO 

br  =  ^  on_1(l  -  a)brn 

n=l 


Then 


Br"{s)  =  Br;(a„_i(l  -  a)s) 

n=  1 

5.1  Average  of  Burst  Length 

Differentiating  Eq.  (  11)  w.r.t.  s, 

OO  OO 

Br''(s)  =  Y^Br*j'(a:>-1(l-a)s)aj-1(l-a)  J]  Br'ia^il  -  a)s) 
3=1 

Substituting  s  =  0,  we  get 


£r*'(0)  =  (l-a)J]aJ'"1Br;,(0) 

3  =  1 

Differentiating  Eq.  (  10)  w.r.t.  s,  and  substituting  s  =  0,  we  get 
Br*'(  0)  =  (n  -  l)pVf(O)  +  (1  -  p)(n  -  1)2?J'(0) 

Substituting  Eq.  (  13)  in  Eq.  (  12), 

infty 

Br"( 0)  =  (1  -  a)  £  O'  -  1)  b^'(0)  +  (1  -  p)^' (°)1 

3  =  1 

Now 


D 


•db 


[CO 

I'W  =  B*'(*)  =  -  /  bfb(b)e~bsc 
Jo 

Substituting  s  =  0  in  the  above  equation,  we  get 


D 


r  OO 

r'(o)  =  -  /  bfMdb 
Jo 

“2  53  x  8  A2 


-  -/ 

J  til 


pA2«2  _  pA2Uil 


. .  Let  br 


(ii) 


(12) 


(13) 
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Similarly, 


D'2'(s) 


~bsdb  -  aFb(a)e~as 

[es,u,X-  eX2utfX’udu  -  qF‘(“) 


The  average  of  burst  length,  br 


=  (a  -  l)Xy_l(j  -  l)b^r'(O)  +  (1  -P)DJ'(O)]  (14) 

3= 1 


where, 


/•U2 

£>r'(o)  =  -  / 

J  U\ 

[R, 

DV( 0)  =  -  / 

*/  Ul 


53  x  8  A2 

U  [eA2u2  _  6A2in] 


A2udu 


53  x  8  A2 

U  [eA2u2  _  eA2Ulj 


e>'2udu  —  aF(,(a) 


(15) 


(16) 


6  NUMERICAL  RESULTS  AND  DISCUSSION 


In  this  section,  we  discuss  two  set  of  numerical  experiments  that  were  made  to  gauge  the  accuracy  of  the 
expressions  derived  in  previous  sections.  The  average  burst  length  at  the  output  side  of  the  multiplexer  is 
calculated  and  compared  with  simulation  results.  In  these  two  experiments,  we  consider  channel  bandwidth  of 
155' Mbits/sec  and  cell  size  of  53  bytes.  In  both  the  experiments,  calls  with  same  characteristics  (i.e.  homogeneous 
calls)  were  considered. 

In  the  first  experiment,  edch  call  is  described  by,  Peak  Rate,  Ri  =  20  MBits/sec.,  a*  =  0.95  and  Average/Peak 
Rate  ratio  =  0.4576.  The  experiment  was  conducted  with  3  different  load  factors,  where  p  is  defined  as 


No.  of  Calls 

Load  Factor 

P 

Burst  Length  at  the  output 
in  sec. 

Simulation 

Calculated 
with  QLDs 
from  Sim. 

Calculated 
with  QLDs 
from  appr. 

13 

0.767 

410 

411 

428 

15 

0.8856 

428 

438 

456 

16 

0.944 

434 

454 

468 
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In  the  second  experiment,  we  consider  each  call  with  Peak  Rate,  IU  —  10  MBits/sec.,  q,  —  0.95  and  Aver¬ 
age/Peak  rate  ratio  =  0.4576. 


No.  of  Calls 

Load  Factor 

P 

Burst  Length  at  the  output 
in  sec. 

Simulation 

Calculated 

with  QLDs 
from  Sim. 

Calculated 

with  QLDs 
from  appr. 

25 

0.738 

812 

828 

848 

27 

0.797 

814 

858 

883 

30 

0.8856 

824 

896 

934 

It  can  be  observed  from  above  tables  that  the  percentage  of  error  in  the  burst  length  calculated  with  Queue 
Length  Distribution  (QLD)  obtained  from  2-  state  MMPP  approximation  is  more  than  in  those  calculated  using 
the  QLDs  obtained  from  the  simulations.  This  may  be  due  to  fact  that  the  probabilities  of  higher  order  queue 
lengths  are  underestimated  in  the  2-  state  MMPP  approximation.  The  QLDs  calculated  with  approximations 
using  higher  order  MMPP  (MMPP  with  more  than  2  states)  may  improve  the  percentage  of  error. 

It  is  also  observed  that  as  the  number  of  calls  increases,  the  percentage  of  error  in  burst  lengths  also  increases. 
This  may  be  attributed  to  the  loss  of  information  about  the  higher  order  moments  of  instantaneous  band-width 
available  to  the  tagged  source  in  the  Maximum  Entropy  approximation.  To  be  more  clear,  let  us  consider 
"occupied  channel  bandwidth”  in  the  homogeneous  case,  which  is  the  sum  of  the  peak  rates  of  those  sources 
which  are  in  ON  state.  The  occupied  channel  bandwidth  is  a  random  variable  which  depends  on  another  random 
variable,  number  of  sources  that  are  in  ON  state  at  that  instant.  The  second  moment  of  the  occupied  channel 
bandwidth  depends  on  the  second  moment  of  the  number  of  sources  that  are  in  ON  state  whose  dependence  on  the 
total  number  of  sources  is  second  order  polynomial.  So  any  increase  in  the  total  number  of  sources,  would  cause 
the  second  moment  of  the  occupied  channel  bandwidth  to  increase  by  a  second  order  polynomial  factor.  Hence  as 
the  total  number  of  sources  increases,  the  difference  between  the  exact  second  moment  and  the  estimated  second 
moment  from  Maximum  Entropy  principle  with  only  first  moment  as  the  constraint,  increases  by  a  second  order 
polynomial  factor.  Since  the  occupied  channel  bandwidth  and  instantaneous  bandwidth  available  to  the  tagged 
call  are  closely  related,  same  arguements  holds  good  for  instantaneous  bandwidth  as  well.  Hence  as  the  number 
of  sources  increases,  the  loss  of  information  about  higher  order  moments  is  higher  in  the  Maximum  Entropy 
approach. 


7  CONCLUSION 

An  approximate  expression  is  derived  for  the  burst  length  of  a  tagged  call  at  the  output  of  dn  ATM  switch  by 
approximating  the  statistical  multiplexer  as  a  single  variable  server  infinite  buffer  queuing  system  with  only  cells 
from  the  tagged  call  as  customers.  Each  incoming  cell  also  sees  the  server  in  randomly  variable  vacation  periods. 
The  density  function  of  the  service  rate  of  the  server  is  approximated  using  Maximum  Entropy  Principle.  Two 
numerical  examples  are  presented  to  gauge  the  accuracy /inaccuracy  of  the  approximation.  Considering  the  fact 
that  only  first  moment  is  used  as  constraint,  the  accuracy  of  the  results  is  impressive.  In  the  authors’  opinion, 
the  main  contributions  of  the  paper  are  1)  introduction  of  Maximum  Entropy  principle  for  the  estimation  service 
time  density  function  and  2)  modeling  of  the  statistical  multiplexer  as  a  variable  server  queuing  systems  with 
server  vacations.  This  approach  can  be  extended  further  by  including  more  constraints  for  better  estimation  of 
the  density  function  of  the  instantaneous  bandwidth  available. 
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APPENDIX  1  SOLUTION  OF  EQ.  (1),  EQ.  (2)  AND  EQ.  (3): 


Using  Langrangian  principle, 

rU2  ru 2  r  rU2 

F(p)  =  —  /  p(u)  In  p(u)du  +  \\  /  p(u)du  —  1  +  A2  /  up(u)du  —  u 

Ju  1  Uui  Uli\ 

Where  Aj,A2  are  Langrangian  Coefficients. 

Differentiate  Eq.  (  17)  with  respect  to  p(u)  and  equate  it  to  0. 


1 


In  p(u)du  +  p(u).--  du 
p[u) 


dF  _  _  [u 

dp  Ju, 

Rewriting, 

ru  2 

/  [—  In  p(u)  —  1  +  Ai  +  A2U]  du  —  0 

J  til 


+  ^1 


[  7  du  +  A2  f 
J  til  \.Ju 


udu 


=  0 


After  simplification,  We  get 

p(u)  =  eAl-1.eA2U 

Substituting  Eq.  (  18)  in  Eq.  (  2), 


Al_1.eA2Udu  =  1 


We  get, 

eA,-l  _ex>«i)  =  \2 

Substituting  Eq.  (  18)  in  Eq.  (  3), 


L 


ueAl  1.eA2Udu  =  u 
After  simplification  and  substituting  Eq.  (  19),  we  get, 


u2eA2"2  -  UieA2Ul  _  _1_  _  _ 

q^2U2  _ 

So  the  probability  density  function  estimated  from  Maximum  Entropy  principle  is  given  by, 
p(u)  =  eA'-1eA2“ 


(17) 


(18) 


(19) 
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Where  Ai,A2  can  be  obtained  from, 

eAi-l  (eA2u2  _  eA2u,j  _ 

u2ex*U2-u  ieA2U1  1 

- ; - ; - v  =  u 

eA2u2  _  eA2ui  \2 

Evaluation  of  Lagrangian  Coefficients 

Define, 


u2eX2U2  —  Uie*2Ul  1 
2  ~  eA2u2  _  eA2u, 

We  can  easily  show  that  - 


..  ,  ui+u2 

lim  /  =  - 

a2->o  2 


lim  /  =  «i 

X2— *  —  OO 


lim  f  =  u  2 
A2— >  +  00 


(20) 


(21) 


The  curve,  obtained  through  numerical  simulations,  for  /  as  a  function  of  A2  is  shown  in  the  figure  below. 


^2 

From  the  above  figure  and  the  limits  of  /,  we  can  conclude  that  Eq.  (  20)  has  unique  solution  A2. 
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Abstract 

Among  the  more  commonly  employed  models  for  performance  analysis  of  ATM  networks,  e.g.  to  dimen¬ 
sion  buffers  in  switches,  we  find  Markov  modulated  Poisson  processes  MMPPs)  and  Markov  modulated 
Bernoulli  processes  (MMBPs).  These  models  are  often  used  with  the  only  motivation  that  they  are  ca¬ 
pable  of  producing  bursty  traffic.  Although  this  is  true  in  a  general  sense,  little  is  known  about  whether 
that  capability  extends  to  the  particular  case  of  real  traffics. 

We  report  on  an  investigation  where  these  models  are  tried  in  the  latter  sense.  More  precisely,  we  review 
and  try  a  number  of  methods  proposed  for  fitting  MMPPs  (MMBPs)  to  observed  traffic  data.  The  data 
consists  of  sixty  traces  which  are  extracted  from  the  Bellcore  Ethernet  measurements  according  to  length 
(short,  medium,  and  long)  and  local  average  load  (light  and  heavy).  We  then  compare  the  performance 
of  the  buffer  of  a  single  server  system  when  subject  to  the  real  traffic  and  the  fitted  model  respectively. 

It  is  found  that  the  two  cases  differ  significantly  in  terms  of  buffer  occupancy,  and  that  these  differences 
are  caused  by  deficiencies  in  the  different  fitting  methods  and  possibly  also  by  limitations  in  the  models 
themselves.  Nevertheless,  some  fitting  methods  are  identified  which,  with  further  development,  might 
work  as  models  of  burstiness  within  limited  time  spans  on  the  order  of  two  seconds.  We  also  briefly 
comment  the  relationship  between  our  results  and  recent  works  on  fractal  traffic  characteristics. 

Keywords 

Bursty  traffic  model,  ATM  cell  level  traffic  model,  accuracy,  Markov  modulated  Posson  Process,  Markov 
modulated  Bernoulli  Process,  MMPP,  MMBP. 

^The  major  part  of  this  work  was  carried  out  while  Christer  Lind  was  with  the  Department  of  Commu¬ 
nication  Systems,  Lund  Institute  of  Technology,  Sweden. 
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1  MARKOVIAN  MODELS  FOR  REAL  ATM  TRAFFICS 
1.1  Markovian  Models 

Models  of  bursty  traffic  are  frequently  used  in  the  context  of  performance  analysis  of  ATM  networks, 
e.g.  to  dimension  buffers  in  switches.  Among  the  more  commonly  employed  models  we  find  the  Markov 
modulated  Poisson  processes  (MMPPs)  and  Markov  modulated  Bernoulli  processes  (MMBPs).  The  two 
processes  are  doubly  stochastic  point  processes  where  the  rate  of  a  Poisson  (Bernoulli)  process  is  governed 
by  an  underlying  Markov  chain  in  continuous  (discrete)  time.  Arrivals  and  state  transitions  of  the  mod¬ 
ulating  chain  are  statistically  independent.  The  processes  are  fully  characterised  by  the  number  of  states 
in  the  modulating  chain  s,  the  transition  rates  (probabilities)  Qu,vi  und  the  arrival  rates  (probabilities) 

J  ti,  v  £  { 1 , . . . ,  s } .  _ 

To  restrict  the  number  of  parameters,  the  number  of  states  s  is  often  set  equal  to  two,  in  which  case 
the  model  is  referred  to  as  a  Switched  Poisson  (Bernoulli)  Process,  or  an  SPP  (SBP).  In  the  special  case 
of  the  SPP  (SBP)  having  an  arrival  rate  of  zero  in  one  of  its  states,  the  process  is  called  an  Interrupted 
Poisson  (Bernoulli)  Process. 

The  main  reasons  why  these  models  are  frequently  employed  are  probably  their  ability  to  match  various 
burstiness  characteristics,  and  their  mathematical  tractability.  However,  little  is  known  about  their  actual 
relevance  when  it  comes  to  producing  a  traffic  that  is  not  only  generally  bursty,  but  that  in  some  meaning 
is  equivalent  to  real  traffic. 

The  current  work  is  a  preliminary  attempt  to  investigate  this  aspect  of  simple  Markovian  models, 
typically  SPPs  and  SBPs.  The  emphasis  of  the  work  is  on  their  suitability  for  performance  analysis,  in 
particular  with  respect  to  buffer  dimensioning.  The  general  idea  is  to  produce  cell  arrivals  to  an  infinite 
buffer  which  is  emptied  by  a  single  server,  and  study  the  number  of  cells  present  in  the  buffer  at  each 
arrival  instant.  A  model  that  in  our  sense  is  equivalent  to  real  traffic,  would  result  in  a  buffer  occupancy 
that  is  statistically  identical  to  that  of  a  real  traffic. 

To  our  knowledge,  very  few  papers  have  been  published  where  models  are  verified  against  real  traffics 
in  terms  of  buffer  occupancy.  Instead  it  appears  that  most  researchers  who  verify  models  tend  to  do  this 
against  other  models  (!).  One  notable  exception  from  this  is  the  paper  on  video  modelling  published  by 
Frater  et  al.  (1994)  and  Rose  (1994),  were  the  queuing  behaviour  of  a  real  traffic  is  compared  to  that  of 
a  model. 


1.2  Replicating  Real  Traffics 

Users  wishing  to  establish  a  connection  over  an  ATM  network  are  required  to  declare  a  number  of 
parameters  characterising  the  traffic  they  wish  to  submit.  These  parameters  include  peak  rate,  sustainable 
rate,  burst  size,  and  possibly  others.  Typical  factors  affecting  the  choice  of  parameter  settings  include 
the  nature  of  the  application,  characteristics  of  the  user  premises  equipment,  the  access  medium,  and  the 
tariff  structures. 

Testing  models  under  this  scenario,  the  model  should  represent  the  traffic  actually  submitted,  i,e, 
after  possible  shaping  by  the  policing  device.  For  a  given  trace,  the  user  could  declare  virtually  any  set 
of  parameters,  and  deliver  the  traffic  in  a  number  of  conforming  and  non-conforming  ways.  To  avoid 
restrictive  presumptions  regarding  these  parameters  and  delivery,  we  assume  that  the  parameters  are  set 
such  that  the  traffic  can  be  passed  transparently  to  the  network,  and  therefore  simply  match  the  models 
directly  to  the  traces. 
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The  procedure  of  matching  a  model  to  a  traffic  trace  is  referred  to  as  a  fitting  method.  The  fitting 
methods  considered  in  this  work  can  be  classified  in  three  categories:  Sequence  fitting,  direct  metrics 
fitting,  and  indirect  metrics  fitting.  It  is  pointed  out  that  the  choice  of  modelling  in  discrete  or  continuous 
time  is  more  a  matter  of  mathematical  convenience  than  of  replication  accuracy,  Arvidsson  et  al.  (1991). 

Sequence  Fitting 

The  idea  of  sequence  fitting  is  based  on  the  presumption  that  the  trace  is  in  fact  produced  by  a  specific 
model  the  parameters  of  which  are  unknown.  Fitting  a  model  to  a  trace  therefore  means  to  find  the  set 
of  parameters  of  this  particular  model  that  have  the  highest  likelihood  of  producing  that  sequence.  The 
typical  procedure  is  to  start  from  an  initial  guess  of  the  parameter  set  and  successively  improve  it  with 
respect  to  the  likelihood  of  obtaining  the  trace  until  no  further  improvement  can  be  obtained. 

We  have  used  two  methods  of  this  class,  one  due  to  Meier-Hellstern  (1987)  (KMH)  and  another  one 
due  to  Ryden  (1992)  (TR).  Both  are  developed  for  MMPPs  with  any  number  s  >  1  of  states,  but  are 
here  applied  to  the  case  s  =  2. 

Direct  Metrics  Fitting 

Direct  metrics  fitting  does  not  presume  that  a  certain  model  is  actually  valid,  but  simply  aims  at  making 
the  model  in  question  reproduce  certain  “important”  and  mathematically  tractable  properties  of  the 
trace.  Typical  such  properties  fitted  to  are  moments  and  correlations  of  inter  arrival  times  and  of  the 
number  of  arrivals  within  intervals  of  length  t. 

We  have  considered  four  such  methods,  Rossiter  (1987)  (MR),  Heffes  et  al.  (1986)  (HL),  Gusella  (1991) 
(RG),  and  Park  et  al.  (1994)  (DP).  The  three  former  are  developed  for  and  applied  to  MMPPs  with  s  =  2 
states  and  the  latter  to  MMBPs  with  s  =  2  states. 

Indirect  Metrics  Fitting 

Indirect  metrics  fitting  means  that  the  observed  process  is  first  transformed  into  another  process  which 
is  then  dealt  with  as  for  direct  metrics  fitting.  The  transformation  procedure  we  have  considered  is  the 
identification  of  “active  periods”  and  “passive  periods”,  an  idea  first  proposed  by  Jain  et  al.  (1986). 
Active  periods  refer  to  uninterrupted  sequences  of  one  or  more  short  inter  arrival  times,  and  passive  ones 
to  uninterrupted  sequences  of  one  or  more  long  inter  arrival  times.  Properties  of  interest  in  the  transformed 
process  include  moments  and  correlations  of  the  lengths  of  the  two  periods  and  of  the  activity  within 
each  of  them. 

We  have  used  four  such  approaches,  Sole  et  al.  (1990)  (SDG/1)  and  (SDG/4),  Bonomi  et  al.  (1994) 
(BMMP),  and  Lee  et  al.  (1992)  (LL).  Both  SDG/i-methods  refer  to  MMBPs  with  s  =  2  states  and  allow 
for  activities  between  zero  and  one  during  both  periods.  BMMP  and  LL  refer  to  MMBPs  and  MMPPs 
with  s  —  3  and  s  =  2  states  respectively,  and  both  prescribe  strictly  no  activity  during  passive  periods 
and  strictly  full  activity  during  active  ones. 

1.3  Preliminaries  of  the  Investigation 

It  is  well  known  that  traffic  characteristics  depend  heavily  both  on  the  source  ( e.g .  video  or  data)  and 
on  the  content  (e.g.  drama,  sports,  file  transfer  and  www- retrievals).  It  is  also  clear  that  not  even  for  a 
given  source  and  content,  there  is  such  a  thing  as  a  “typical  behaviour”.  A  general  investigation  of  traffic 
replicating  properties  would  therefore  require  tremendous  amounts  of  recorded  traffic  traces.  We  have 
restricted  ourselves  to  one  class  of  traffic  which  could  be  labelled  “LAN  interconnect”.  The  motivation 
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for  our  particular  choice  is  twofold:  LAN  interconnect  is  expected  to  be  one  of  the  first  traffics  to  be 
sent  over  ATM,  and  LAN  traffics  measurements  were  readily  available  to  us  through  the  Bellcore  (1989) 
measurements. 

Numerous  papers,  e.g.  Leland  et  al.  (1994),  Paxon  et  al.  (1994),  Pruthi  (1995),  and  others,  have  reported 
on  the  self  similar  properties  of  these  traffic  traces,  the  presence  of  variations  on  all  time  scales,  and 
the  heavy  tailed  buffer  occupancy  distributions  resulting  from  them.  These  findings  raise  fundamental 
questions  regarding  the  relevance  of  Markovian  models,  in  particular  for  those  with  small  numbers  of 
states  s,  the  variabilities  of  which  span  a  strictly  limited  time  scale,  cf.  Andersen  (1995). 

The  present  work  is,  however,  restricted  to  model  variations  within  certain  time  scales.  This  is  motivated 
by  engineering  aspects  of  buffer  dimensioning,  where  loss  constraints  for  slowly  varying  traffics  may  call 
for  very  large  buffers,  quite  possibly  large  enough  to  violate  delay  constraints  and  even  beyond  reasonable 
physical  limitations.  (This  becomes  obvious  when  looking  at  buffer  sizes  and  performance  for  systems 
that  store  excess  traffic  generated  during  working  hours  and  transmit  it  during  the  nights.)  Generally 
speaking,  we  can  thus  identify  two  kinds  of  variations:  Fast  variations  which  can  be  smoothed  by  a  buffer, 
and  slow  variations  which  cannot.  We  are  only  interested  in  the  former. 

For  slow  variations  we  can  see  at  least  three  possible  ways:  the  first  one  is  to  multiplex  a  very  large 
number  of  independent  sources  in  which  case  even  slow  variations  can  be  statistically  multiplexed,  the 
second  one  is  to  provide  enough  transmission  capacity  to  handle  the  peaks  and  simply  put  up  with  the 
resulting  poor  utilisation  in  the  valleys;  and  the  third  one  is  to  trace  the  slow  variations  and  dynamically 
adjust  the  allocated  transmission  capacity  in  accordance  with  the  variations.  Our  work  is  based  on  the 
last  approach,  and  we  presume  the  presence  of  a  control  mechanism  that  dynamically  adjusts  the  capacity 
of  the  server  to  the  long  term  average  of  the  traffic  load.  We  do  not  develop  such  a  mechanism  here,  but 
only  mention  that  it  could  be  driven  by  user  initiated  requests  for  more  or  less  network  resources  following 
the  opening  or  closing  of  new  applications  (ftp,  telnet,  netscape  etc.),  with  signals  from  system  initiated 
monitoring  of  traffics  and/or  buffers  as  an  alternative  or  supplement. 

In  the  language  of  ATM  traffic  control  variations  are  often  said  to  take  place  in  the  cell  scale  (typically 
on  the  order  of  /rs),  burst  scale  (ocms),  activity  scale  (ocs),  session  scale  (ccmin)  etc,  e.g.  Bagnoli  et  al. 
(1994),  Hui  (1988),  Key  (1995),  Ramamurthy  et  al.  (1994),  and  others.  Clearly,  buffers  are  intended  only 
for  the  cell-,  burst-  and  possibly  activity  scale,  hence  modelling  of  these  scales  is  sufficient  from  a  buffer 
dimensioning  point  of  view. 


2  EXPERIMENTS  WITH  REAL  TRAFFIC  AND  MODELS 

2.1  Background 

We  defined  an  experimental  test  bed  based  on  the  following  scenario:  A  user  wishes  to  convey  LAN  data 
over  an  ATM  network.  The  LAN  is  a  10  Mbps  Ethernet,  and  the  user  is  connected  transparently  to  the 
ATM  network  via  a  34  Mbps  link.  Before  the  LAN  packets  are  delivered  over  this  link  to  the  network, 
the  Ethernet  overhead  is  stripped  of,  and  the  remaining  data  packed  into  cells.  Each  53  octet  cell  can 
take  44  octets  of  Ethernet  data,  since  4  octets  “pay  load”  are  used  for  AAL3/4  overhead,  and  the  last  5 
octets  constitute  the  ATM  header. 

We  implemented  a  simulator  with  a  server  of  capacity  C  and  an  infinite  buffer.  Cell  arrivals  follow 
sample  traces  from  the  Bellcore  (1989)  material  converted  to  ATM  as  above,  or  are  drawn  from  a  mathe¬ 
matical  model.  The  traces  used  were  chosen  according  to  length  and  load:  Three  time  scales  were  chosen, 
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Figure  1  Experiments  carried  out. 


viz.  0.20,  2.00  and  20.0  seconds,  and  two  long  term  loads,  viz.  42%  and  85%  of  the  actual  peak  value 
observed  in  intervals  of  those  lengths  in  the  entire  material  made  available  to  us.  Finally,  for  each  time 
scale  and  load  condition  were  10  distinct  traces  selected,  resulting  in  a  total  of  3  x  2  x  10  =  60  traces.  It 
is  observed  that  the  chosen  time  scales  should  well  cover  the  normal  scope  of  buffer  modelling,  i,e,  cell 
scale  and  burst  scale  variations. 

For  each  trace,  the  transmission  capacity  C  was  set  according  to  the  formula  for  equivalent  bandwidth 
given  by  Vakil  (1993)  C  =  a(l  -  log  a/p),  where  a  is  the  long  term  average  rate  (in  our  case  over  the 
entire  trace)  and  p  is  the  peak  rate  (in  our  case  34  Mbps).  This  setting  is  high  enough  to  ensure  that 
the  system  is  not  overloaded,  while  at  the  same  time  it  is  low  enough  to  let  queues  build  up  during  the 
peaks. 


2.2  Experiments 

In  an  initial  series  of  runs,  each  real  trace  was  used  as  arrival  generator  in  our  simulator,  figure  1.  The 
trace  was  run  repeatedly  in  order  to  emulate  a  local  “steady  state” .  At  each  arrival  instant  we  noted  the 
number  of  cells  present  in  the  buffer,  which  was  taken  as  the  sole  performance  metric  for  the  single  server 
system,  denoted  in  the  figure  as  “Performance  data  0”  (PD0). 

Next,  each  of  the  traces  were  fed  into  each  of  the  parameter  fitting  procedures  mentioned  above.  This 
resulted  in  one  set  of  model  parameters  for  each  trace  and  each  fitting  method.  The  models  thus  obtained 
where  then  used  as  traffic  sources  in  our  simulator,  and  the  performance  of  the  buffer  was  monitored  as 
before.  The  observations  are  shown  as  “Performance  data  1”  (PD1)  in  figure  1. 

Noting  that  the  models  are  fit  directly  to  the  traces,  one  would  ideally  expect  that  the  models  give  the 
same  buffer  performance  as  the  real  traces,  i.e.  PD0  to  be  equal  to  PD1.  However,  it  must  be  remembered 
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Table  1  Number  of  infeasible  fits. 


Model 

0.20 

sec. 

2.00  sec. 

20.0 

sec. 

used 

42%  load 

85%  load 

42%  load  85%  load 

42%  load 

85%  load 

KMH 

— 

— 

-  - 

— 

— 

TR 

— 

1 

—  — 

— 

— 

MR 

6 

10 

5 

— 

3 

HL 

4 

9 

4 

— 

— 

RG 

3 

10 

8 

1 

6 

DP 

5 

8 

1 

— 

— 

BBMP 

LL 

SDG/1 

— 

— 

-  - 

— 

— 

— 

— 

-  - 

— 

— 

SDG/4 

— 

— 

-  - 

— 

— 

that  the  models  themselves  cannot  take  all  the  blame  of  any  differences  detected,  but  some  may  be  due 
to  deficiencies  in  the  parameter  estimation  etc.  We  may  thus  say  that  any  difference  obtained  between 
a  PDO  and  PD1  consists  of  two  components:  One  which  is  due  to  the  model,  and  an  one  which  is  due 
to  the  fitting  method  and  our  implementation  thereof.  We  call  the  former  component  “model  error” ,  the 
latter  part  “method  error”  and  refer  to  the  observed  sum  as  “total  errors”. 

In  order  to  estimate  the  two  components  separately,  a  new  set  of  experiments  was  conducted:  The 
above  runs  for  each  model  and  fitting  method  were  monitored  and  fed  to  the  same  fitting  procedure 
as  the  one  used  for  the  model  under  study,  i,e,  we  fitted  each  of  model  to  themselves.  The  resulting 
set  of  models  were  then  taken  as  arrival  generators  in  our  simulator,  and  the  buffer  performance  again 
monitored  as  before.  The  results  are  indicated  as  “Performance  data  2”  (PD2)  in  figure  1.  The  fact  that 
the  models  fitted  to  are  valid  by  definition  in  this  series  of  runs  means  that  there  are  no  model  errors, 
but  any  differences  between  PDl  and  PD2  relate  to  method  errors  only.  Loosely  speaking,  we  may  then 
obtain  the  model  error  by  subtracting  the  method  error  from  the  total  error. 


3  RESULTS 

3.1  Validity 

For  a  set  of  model  parameters  to  be  feasible ,  we  require  that  arrived  rates  are  >  0  and  transition  rates 
>  0  for  MMPPs,  and  that  arrival  probabilities  are  >  0  and  <  1  and  transition  probabilities  >  0  and  <  1 
for  MMBPs.  The  requirements  follow  from  physical  interpretations  with  the  added  condition  that  the 
modulating  chain  must  not  be  absorbing.  Not  all  fitting  methods  came  up  with  feasible  parameters  for 
all  samples.  The  number  of  failures  are  shown  in  table  1  for  each  model  respectively. 

The  table  shows  that  these  anomalies  occur  almost  solely  for  direct  metrics  fitting.  The  only  exception 
from  this  rule  is  one  sequence  fit,  where  the  modulating  chain  turned  out  to  be  absorbing.  It  is  also 
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Table  2 

Number  of  abnormal  fits. 

Model 

0.20  sec. 

2.00 

sec. 

20.0 

sec. 

used 

42%  load 

85%  load 

42%  load 

85%  load 

42%  load 

85%  load 

KMH 

1 

— 

— 

— 

6 

— 

TR 

— 

— 

2 

4 

7 

— 

MR 

— 

— 

— 

— 

— 

— 

HL 

— 

— 

— 

— 

— 

— 

RG 

— 

— 

— 

— 

1 

— 

DP 

— 

— 

— 

— 

2 

— 

BBMP 

— 

— 

— 

— 

5 

2 

LL 

1 

7 

— 

— 

- 

— 

SDG/1 

— 

— 

— 

— 

9 

— 

SDG/4 

— 

— 

5 

— 

7 

1 

seen  that  infeasible  parameters  are  more  often  obtained  at  high  loads  and  when  fitting  to  short  intervals. 
Notably,  MR  and  RG  failed  for  all  ten  traces  for  the  most  extreme  case  in  this  respect. 

Infeasible  parameters  are  explained  as  follows:  Direct  fitting  methods  employ  four  equations  in  four 
metrics  from  which  the  four  MMPP  (MMBP)  parameters  are  found.  The  output  of  an  MMPP  (MMBP) 
has  certain  limits  regarding  the  relations  between  various  metrics,  and  infeasible  parameters  from  a  certain 
trace  therefore  indicate  that  the  model  is  incapable  of  exactly  reproducing  the  metrics  of  that  trace.  In 
this  case,  one  could  alternatively  find  the  nearest  feasible  solution  as  some  kind  of  best  fit.  However, 
our  work  does  not  aim  at  developing  or  improving  fitting  methods,  but  is  restricted  to  testing  existing 
proposals. 

Furthermore,  for  a  fitting  to  be  meaningful,  the  resulting  traffic  model  must  produce  an  average  queue 
length  that  is  in  the  vicinity  of  the  one  obtained  for  the  real  traffic.  We  have  rather  arbitrarily  stated 
that  non-meaningful  results  are  those  that  differ  by  a  factor  of  10  or  more  from  the  target  values.  The 
occurrence  of  such  cases  is  shown  in  table  2. 

It  is  seen  that  abnormal  fits  almost  only  occur  for  sequence  fitting  and  indirect  fitting.  A  closer  look 
at  the  numbers  behind  the  table  reveals  mismatches  resulting  in  permanent  overloads  of  the  simulated 
system  for  the  entries  referring  to  sequence  fitting.  This  means  that  the  considered  methods,  which  are 
iterative,  sometimes  converge  towards  a  solution  that  is  not  correct  in  terms  of  average  arrival  rate.  Again, 
it  is  beyond  the  scope  of  this  work  to  solve  the  problems  behind  this  phenomenon.  For  the  indirect  fitting 
methods,  the  abnormal  values  are  less  severe,  but  simply  point  at  weaknesses  in  the  methods  as  such. 


3.2  Accuracy 

Metrics 

We  now  remove  the  infeasible  and  abnormal  fits  from  our  data  set  and  investigate  the  accuracy  of  the 
models  with  respect  to  the  remaining  runs.  More  precisely,  we  consider  how  well  the  various  models  and 
fitting  methods  can  mimic  real  traffics  with  respect  to  dimensioning  buffers  over  the  selected  time  scales. 
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Let  the  occupancy  of  the  buffer  at  an  arrival  instant  be  denoted  by  a  stochastic  variable  Q  and 
define  two  primary  metrics  of  system  performance,  viz.  E{Q},  the  mean  buffer  occupancy  over  the  entire 
distribution,  and  E{Q'},  the  mean  occupancy  over  the  tail  of  the  distribution, 

oo  oo 

£{Q}  =  EM*);  E{Q'}  =  E 

k— 0  k—k1 

where  p(k)  refer  to  the  probability  of  an  arriving  customer  finding  k  customers  already  in  the  queue,  and 
p'(k)  is  p(k)  renormalised  over  the  tail.  The  tail  is  defined  as  all  states  k  >  k',  where  k'  is  the  smallest  k' 
such  that  p(k)  -  10-2-  Note  that  the  latter  metric  does  not  refer  to  a  single  point,  which  would 

have  made  it  very  sensitive,  but  to  the  rescaled  average  of  the  last  percent  of  the  distribution  and  thus 
captures  the  tail  in  a  wider  sense.  This  number  was  chosen  as  a  compromise  between  tail  probabilities 
relevant  to  buffer  dimensioning,  typically  on  the  order  of  10— 9 ,  and  simulation  feasibility  and  accuracy. 

Adding  to  the  notation,  we  let  Q ,  be  the  performance  metric  observed  from  the  ith  data  set  in  figure 
1,  i,e,  Q o  refers  to  the  real  trace,  Qi  to  the  fit  to  the  real  trace,  and  Q2  refers  to  the  fit  to  the  fit.  Finally, 
we  define  two  metrics  of  the  total  error  mentioned  in  figure  1  as 

«tot(Q)  =  l  -  £{Qi}/£{Qo};  etot(Q')  =  1  -  E{Q[}/E{Q'0 } 

two  metrics  of  the  method  error  in  the  same  figure  as 

em.t(Q)  =  1  -  E{Q2}/E{Qly,  emet(Q')  =  1  -  E{Q'2} / E{Q[} 

and  two  metrics  of  the  model  error  in  the  same  figure  as 

£mod (Q)  =  E{Q2}/E{Q1}  -  ElQO/EiQo};  emod (Q1)  =  E{Q'2}/E{Q[}  -  E{Q[}/E{Q'0 } 

Total  Errors 

Tables  3  and  4  show  etot(Q )  and  Ctot^)  respectively  for  each  combination  of  time  scale  and  load.  The 
numbers  shown  refer  to  the  average  over  all  valid  traces.  Rather  than  providing  standard  deviations  as  a 
supplement,  each  entry  in  the  table  was  subject  to  a  t-test,  i,e,  we  tested  whether  the  observed  average, 
given  the  variations  between  the  various  traces,  could  in  fact  be  an  observation  of  a  distribution  with 
zero  average.  The  results  are  depicted  in  tables  5  and  6  respectively:  The  number  of  stars  indicate  the 
confidence  by  which  the  hypothesis  is  rejected:  three  stars  mean  99.9%  certainty,  two  stars  99%  certainty 
and  one  star  95%  certainty.  No  stars  thus  indicate  that  the  hypothesis  cannot  be  rejected  with  an  error 
probability  below  5%,  but  not  that  the  hypothesis  is  correct. 

It  is  seen  that  large  errors  are  frequent,  and  generally  more  so  for  the  tail  than  for  the  whole  distribution. 
We  also  note  that  while  many  models  tend  to  over  estimate  the  mean  of  the  queue  length,  they  still 
underestimate  the  tail,  an  observation  in  accordance  with  observations  from  heavy  tailed  traffic. 

Table  3  might  give  the  impression  that  some  of  the  methods  based  on  direct  metrics  fitting  perform 
reasonably  well  for  short  intervals  with  small,  non-significant  errors.  However,  it  must  be  remembered 
that  these  values  are  based  on  very  few  actual  observations  because  of  the  large  number  of  infeasible  fits, 
cf.  table  1.  A  similar  observation  holds  for  the  results  of  TR  in  the  cases  of  longer  traces. 

The  same  is  true  for  the  mean  of  the  tail  of  the  distribution:  The  only  positions  with  small  average 
errors  which  pass  a  test  for  zero,  are  those  that  contain  few  entries,  in  particular  the  case  with  a  time  span 
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Table  3  Total  errors  observed  for  the  mean  of  the  whole  of  the  distribution. 


Model 

used 

0.20 

42%  load 

sec. 

85%  load 

2.00 

J2%  load 

sec. 

85%  load 

20.0 

42%  load 

sec. 

85%  load 

KMH 

36 

** 

-136 

*** 

79 

*** 

53 

*** 

88 

*** 

77 

*** 

TR 

48 

*** 

-117 

** 

-6 

55 

*** 

2 

10 

MR 

14 

— 

— 

42 

*** 

34 

** 

58 

*** 

52 

*** 

HL 

23 

* 

— 

— 

43 

*** 

41 

** 

56 

*** 

56 

*** 

RG 

11 

— 

— 

43 

*** 

36 

* 

53 

*** 

53 

** 

DP 

18 

-144 

75 

*** 

42 

*** 

82 

*** 

67 

*** 

BBMP 

52 

*** 

-34 

** 

82 

*** 

68 

*** 

86 

*** 

78 

*** 

LL 

-409 

*** 

-246 

-226 

** 

-232 

* 

-207 

*** 

-178 

*** 

SDG/1 

7 

-168 

*** 

65 

*** 

39 

*** 

— 

— 

68 

*** 

SDG/4 

46 

*** 

-70 

*** 

76 

*** 

53 

*** 

83 

*** 

74 

*** 

Table  4 

Total 

errors  observed  for  the 

mean 

of  the  tail  of  the  distribution. 

Model 

used 

0.20  sec. 

42%  load  85%  load 

2.00  sec. 

42%  load  85%  load 

20.0  sec. 

42%  load  85%  load 

KMH 

-138 

** 

-700 

*** 

39 

** 

14 

87 

*** 

74 

*** 

TR 

-102 

** 

-648 

*** 

-54 

22 

3 

41 

MR 

-204 

— 

— 

-31 

3 

53 

*** 

48 

*** 

HL 

-163 

** 

— 

— 

-42 

-6 

37 

** 

54 

*** 

RG 

-187 

** 

— 

— 

-30 

7 

48 

*** 

64 

** 

DP 

-177 

* 

-685 

32 

** 

-1 

76 

*** 

66 

*** 

BBMP 

-54 

-336 

*** 

51 

*** 

50  *** 

80 

*** 

76 

*** 

LL 

-1305 

*** 

-1014 

* 

-640 

** 

-377  * 

-291 

*** 

-205 

*** 

SDG/1 

-223 

*** 

-842 

*** 

7 

-10 

— 

— 

67 

*** 

SDG/4 

-86 

* 

-532 

*** 

40 

** 

22  * 

81 

** 

74 

*** 

of  2  seconds  and  with  a  long  term  average  load  of  85%.  An  overall  conclusion  is  that  the  generally  large 
errors  make  it  hard  to  find  a  “best  model” ,  and  selecting  a  “worst  model”  appears  equally  meaningless. 

Method  Errors 

We  will  now  attempt  to  get  a  better  idea  of  the  origin  of  the  errors:  i,e,  if  these  should  be  attributed  to 
the  models  themselves,  or  if  it  is  just  as  likely  that  it  is  the  fitting  procedure  and  our  implementation 
thereof  that  are  to  be  blamed.  Tables  5  and  6  show  emet(Q)  and  emet(<2')  in  the  same  way  as  above.  It 
is  immediately  seen  that  the  method  errors  are  by  no  means  small  or  insignificant  for  any  of  the  models 
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Table  5  Method  errors  for  the  mean  of  whole  of  the  distribution. 


Model 

used 

0.20 

42%  load 

sec. 

85%  load 

2.00  sec. 

42%  load  85%  load 

20.0  sec. 

42%  load  85%  load 

KMH 

-16 

*** 

-19 

*** 

-14 

*** 

-16 

*** 

-12 

*** 

-15 

*** 

TR 

-1 

** 

-3 

*** 

-3 

-2 

*** 

-4 

-2 

** 

MR 

17 

— 

— 

18 

4 

28 

*** 

14 

* 

HL 

32 

** 

— 

— 

15 

** 

38 

** 

7 

* 

32 

*** 

RG 

31 

** 

— 

— 

28 

* 

2 

25 

** 

13 

DP 

0 

1 

1 

*** 

1 

* 

1 

** 

1 

** 

BBMP 

-2 

* 

-1 

* 

-1 

-1 

** 

-2 

-1 

*** 

LL 

-9 

-46 

-16 

-40 

** 

-10 

24 

** 

SDG/1 

-26 

*** 

-19 

*** 

-13 

* 

-23 

*** 

— 

— 

-28 

*** 

SDG/4 

23 

*** 

15 

*** 

31 

** 

26 

*** 

18 

** 

33 

*** 

Table  6  Method 

errors 

for  the 

mean 

of  tail  of  the  distribution. 

Model 

used 

0.20  sec. 

42%  load  85%  load 

2.00  sec. 

42%  load  85%  load 

20.0 

42%  load 

sec. 

85%  load 

KMH 

-15 

** 

-14 

*** 

-15 

* 

-11 

*** 

-16 

** 

-8 

*** 

TR 

1 

0 

0 

-1 

-2 

-5 

MR 

14 

— 

— 

15 

3 

25 

*** 

16 

** 

HL 

29 

* 

— 

— 

11 

* 

35 

** 

7 

* 

29 

*** 

RG 

30 

* 

— 

— 

29 

** 

4 

25 

** 

10 

DP 

22 

*** 

19 

24 

*** 

14 

*** 

9 

* 

3 

BBMP 

0 

-4 

-1 

-2 

-7 

-2 

* 

LL 

-28 

-18 

-18 

-45 

** 

-7 

27 

** 

SDG/1 

-42 

*** 

-21 

** 

-9 

-40 

*** 

— 

— 

-37 

*** 

SDG/4 

28 

*** 

20 

*** 

34 

** 

24 

*** 

23 

** 

29 

*** 

but  TR  and  BBMP.  If  only  the  mean  is  considered,  the  method  errors  of  DP  could  also  be  referred  to  as 
small. 

It  is  clear  that  the  fitting  methods  are  not  particularly  stable  when  it  comes  to  estimating  parameters 
of  a  model  which  they  essentially  have  created  themselves.  This  does  not  mean  to  say  that  the  formulae 
provided  in  the  various  papers  where  the  methods  are  put  forward  are  incorrect.  What  it  does  say, 
however,  is  that  the  traffic  characteristics  we  are  concerned  with  results  in  parameters  which  are  hard 
to  estimate.  That  is,  when  the  trace  is  fitted  to  a  model  for  the  first  time,  we  get  parameters  which 
are  in  range  of  hard-to-estimate  MMPP-parameters  (MMBPs-parameters).  The  existence  of  such  cases 
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Table  7  Model  errors  for  the  mean  of  whole  of  the  distribution. 


Model 

used 

0.20 

Jf2%  load 

sec. 

85%  load 

2.00 

42%  load 

sec. 

85%  load 

20.0 

42%  load 

sec. 

85%  load 

KMH 

52 

** 

-117 

*** 

93 

*** 

69 

*** 

100 

*** 

92 

*** 

TR 

49 

*** 

-115 

** 

-3 

57 

*** 

7 

13 

MR 

-2 

— 

— 

25 

30 

31 

** 

38 

*** 

HL 

-9 

— 

— 

28 

* 

3 

49 

*** 

23 

** 

RG 

-21 

— 

— 

15 

33 

27 

** 

40 

DP 

18 

-144 

74 

*** 

42 

*** 

81 

*** 

66 

*** 

BBMP 

54 

*** 

-34 

** 

82 

*** 

69 

*** 

89 

*** 

80 

*** 

LL 

-401 

** 

-200 

-210 

** 

-191 

-196 

** 

-202 

*** 

SDG/1 

33 

* 

-150 

*** 

78 

*** 

62 

*** 

— 

— 

96 

*** 

SDG/4 

23 

-84 

*** 

45 

** 

27 

** 

65 

*** 

41 

*** 

Table  8 

Model 

errors  for  the 

mean 

of  tail  of  the  distribution. 

Model 

used 

0.20 

42%  load 

sec. 

85%  load 

2.00 

42%  load 

sec. 

85%  load 

20.0 

42%  load 

sec. 

85%  load 

KMH 

-123 

* 

-686 

*** 

54 

** 

25 

** 

103 

*** 

82 

*** 

TR 

-103 

** 

-648 

*** 

-54 

23 

5 

46 

MR 

-219 

— 

— 

-45 

0 

28 

** 

31 

** 

HL 

-192 

*** 

— 

— 

-53 

-41 

30 

* 

25 

* 

RG 

-217 

** 

— 

— 

-60 

* 

3 

22 

* 

55 

* 

DP 

-198 

* 

-704 

8 

-15 

67 

*** 

63 

*** 

BBMP 

-54 

-331 

*** 

51 

*** 

52 

*** 

87 

*** 

78 

*** 

LL 

-1277 

*** 

-996 

-622 

** 

-332 

* 

-285 

*** 

-232 

*** 

SDG/1 

-181 

** 

-822 

*** 

15 

31 

** 

— 

— 

104 

*** 

SDG/4 

-114 

** 

-552 

*** 

6 

-2 

58 

** 

44 

*** 

is  mentioned  already  by  many  of  the  authors  behind  the  fitting  methods,  see  e.g.  Meier- Hellstern  (1987), 
Rossiter  (1987),  and  Ryden  (1992). 

Model  Errors 

We  will  finally  try  to  get  an  idea  of  the  applicability  of  the  models  themselves,  without  respect  to  the 
particular  fitting  method  used.  Tables  7  and  8  show  emod(<3)  and  £mod(Q')  in  the  same  way  as  above. 

It  is  noted  that  the  model  errors  are  of  the  same  order  as  the  total  errors  and  larger  than  the  method 
errors.  Comparing  to  the  former  accuracy  measures,  the  differences  between  the  various  fitting  methods 
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Real  trace  Model  trace 


Figure  2  Comparison  of  buffer  occupancies  resulting  from  real  and  artificial  traces  for  a  trace  of  2.00 
seconds  with  85%  load.  The  artificial  trace  is  produced  by  an  SPP  fitted  by  means  of  the  HL-method. 
The  upper  plots  refer  to  the  whole  trace  and  the  lower  ones  to  the  first  2%  of  the  trace. 


remain,  hence  our  attempt  to  separate  the  fitting  method  from  the  model  is  not  entirely  successful. 
The  tables  clearly  show  that  no  model  succeeds  in  accurately  predicting  both  the  mean  of  the  whole 
distribution  and  of  its  tail.  As  before,  low  values  are  almost  exclusively  noted  in  conjunction  with  a  large 
number  of  failed  fits.  This  makes  it  hard  to  point  at  any  particularly  successful  or  promising  model. 


Some  Detailed  Results 

To  get  a  deeper  understanding  of  the  results,  we  have  arbitrarily  selected  a  case  for  which  reasonably 
good  agreement  was  obtained  in  the  study  above,  viz.  direct  fitting  for  2  second  intervals. 

Two  two  plots  in  figure  2  show  the  buffer  occupancy  distributions  for  the  real  and  artifical  traffics 
respectively.  The  two  curves  clearly  appear  quite  similar  at  a  first  glance.  On  the  other  hand,  at  a  closer 
look,  the  two  differ  around  zero  and  in  their  tails:  The  real  data  has  a  lower  value  at  the  origin  and 
exhibits  a  knee  at  the  tail,  while  the  model  data  has  a  higher  value  in  the  origin  and  the  tail  is  straight. 
These  findings  are  in  agreement  with  what  has  been  suggested  by  Pruthi  (1995)  and  others:  Markov-type 
models  result  in  buffer  occupancy  distributions  with  exponential  tails,  while  many  real  traffics  result  in 
power-law  tails. 

Looking  at  the  similarity  of  the  curves,  these  differences  might  be  regarded  as  minor  details,  but  it 
must  be  remembered  that  the  models  are  to  be  used  for  determining  loss  probabilities  on  the  order  of 
10-9.  Using  our  performance  metrics,  the  similar  shapes  are  reflected  by  the  means,  £{<5o}  =  53.46  for 
the  real  data  and  £{Qj}  =  49.39  for  the  model,  while  the  different  tails  results  in  E{Q'0}  =  251.8  and 
=  462.2  respectively. 

Figure  3  shows  plots  of  the  number  of  arrivals  within  intervals,  N(t,  t  +  At)  vs.  t  for  the  real  and 
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Figure  3  Comparison  of  real  and  artificial  traces  for  a  trace  of  2.00  seconds  with  85%  load.  The  artificial 
trace  is  produced  by  an  SPP  fitted  by  means  of  the  HL-method.  The  upper  plots  refer  to  the  whole  trace 
and  the  lower  ones  to  the  first  2%  of  the  trace. 


artificial  traces.  The  upper  plots  refer  to  the  whole  trace  and  the  lower  ones  show  the  first  2%  of  the  trace 
in  more  detail. 

It  is  noted  that  there  are  no  apparent,  fundamental  differences  in  the  large  time  scale  between  the  real 
trace  and  the  artificial  one.  However,  it  is  also  seen  from  the  two  lower  plots  that  this  statement  does  not 
seem  to  hold  in  the  higher  frequencies.  This  again  confirms  the  results  by  Pruthi  (1995)  and  others,  that 
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the  lower  frequencies  rule  the  average  queue  length,  and  hence  the  average  delay,  while  higher  frequencies 
are  critical  to  the  tail  of  the  queue  and  therefore  to  the  loss  probability  in  case  of  an  infinite  buffer. 

Noting  a  reasonable  agreement  for  the  overall  mean,  but  a  less  good  for  the  tail,  it  is  tempting  to 
conclude  the  models  covered  by  our  investigation  might  be  more  useful  for  calculating  delays  than  losses. 
Tables  3-8  do,  however,  not  support  such  a  conclusion  in  general. 


4  CONCLUSIONS  AND  FURTHER  WORK 

We  have  tried  a  number  of  methods  proposed  for  fitting  an  MMPP  (MMBP)  to  observed  traffic  data. 
Sixty  data  sets  were  extracted  from  the  Bellcore  Ethernet  measurements  according  to  length  and  local 
average  load,  so  that  short,  medium  and  long  periods  of  both  light  and  heavy  loads  were  tried.  We  then 
compared  the  performance  of  the  buffer  of  a  single  server  system  when  subject  to  the  real  traffic  and 
when  subject  to  traffic  from  the  fitted  model. 

Several  cases  of  infeasible  parameters  were  recorded.  A  simple  solution  to  this  problem  might  be  to 
restate  the  various  methods  as  constrained  optimisation  problems,  were  a  best  fit  under  the  condition  of 
feasible  parameters  is  determined. 

It  was  found  that  the  two  cases  differ  significantly  in  terms  of  buffer  occupancy,  and  that  these  differ¬ 
ences  are  caused  by  deficiencies  in  the  different  fitting  methods  and  possibly  also  by  limitations  in  the 
models  themselves. 

Restricting  ourselves  to  shorter  time  spans  of  up  to  two  seconds,  it  was  noted  that  direct  metric  fitting 
methods  produced  the  smallest  errors  and  resulted  in  traces  that  appeared  identical  to  the  real  ones  on 
a  large  time  scale.  It  is  therefore  concluded  that  the  most  promising  candidates  for  a  “good  model  and 
fitting  method”  are  found  in  the  group,  though  further  work  is  needed  to  clarify  the  importance  of  and 
methods  for  fitting  a  wider  range  of  frequencies  before  “safe”  models  can  be  devised. 

Moreover,  if  time  scales  of  two  seconds  can  be  models,  there  is  no  reason  why  shorter  time  spans  could 
not  be  mastered  too  if  the  problem  of  fitting  to  a  small  data  set  can  be  solved.  On  the  other  hand,  it 
also  seems  clear  from  the  tables  that  the  chances  of  finding  small  MMPPs  (MMBPs)  that  remain  valid 
for  time  scales  of  20  seconds  and  above  are  fairly  slim. 

This  work  is  different  from  what  is  normally  published  on  modelling  and  fitting  bursty  traffics  in  that 
we  use  real  traffic.  This  fact  means  that  we  have  had  to  develop  new  practices  and  faced  difficulties  in 
ending  up  with  neat  conclusions  regarding  a  perfect  model  and  fitting  method. 

We  believe,  however,  that  there  is  enough  real  data  available  to  stop  validating  models  against  models, 
but  actually  use  real  data  instead.  This  work  constitutes  a  first  step  in  this  direction,  and  we  hope  to 
have  inspired  others  than  ourselves  to  continue  this  important  work.  We  can  identify  a  large  number  of 
issues  that  need  to  be  looked  into,  for  instance 

•  Finding  traffic  characteristics  which  are  relevant  from  the  point  of  view  of  buffer  dimensioning  and  for 
which  simple  and  robust  estimation  techniques  can  be  devised.  Some  theoretical  proposals  are  given 
in  e.g.  Andrade  ef  al.  (1991)  and  Griinenfelder  et  al.  (1994). 

•  Finding  fitting  methods  for  these  characteristics  which  always  come  up  with  the  best  physically  feasible 
fit.  A  first  approach  is  to  somewhat  modify  the  methods  tried  here. 

•  A  repeat  of  our  investigation  but  with  much  more  than  ten  samples  per  time  scale  and  utilisation 
level. 

•  Repeating  our  investigation  as  above  for  other  traffic  sources  than  the  Bellcore  Ethernet. 
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We  also  note  that  if  modelling  short  time  scale  variations  shall  be  useful,  a  number  of  issues  must  be 
resolved,  for  example  how  to  handle  the  long  term  variations  in  practice.  Possible  candidates  includes 
reallocations  of  transmission  capacity  according  to  network  predictions  ( e.g .  by  monitoring  cell  flow  or 
buffer  contents)  or  users’  requests  (e.g.  when  opening  or  closing  particular  applications  or  application 
modes). 
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Abstract 

We  propose  a  traffic  management  mechanism  for  connectionless  networks  on  top  of  ATM 
infrastructures.  The  mechanism  combines  flow  control  at  the  packet  layer  (connectionless 
layer)  and  dynamic  bandwidth  allocation  of  the  ATM  connections  interconnecting  the 
connectionless  servers  of  the  connectionless  network.  Optimal  mechanisms  are  obtained 
through  Markov  decision  processes  for  a  model  of  two  tandem  queues.  The  obtained 
bandwidth  gain  motivates  the  analysis  of  such  mechanisms  in  a  more  realistic  model.  The 
simulation  of  a.  more  detailed  model  of  a.  connectionless  network  allows  us  to  conclude  on 
the  favorable  impact  of  dynamic  resource  allocation  on  the  bandwidth  gain  and  on  the 
reduction  of  the  sensitivity  of  the  performances  of  the  network  with  respect  to  the  charac¬ 
teristics  of  the  traffic.  The  traffic  management  mechanism  implemented  in  the  simulator 
are  motivated  by  the  optimal  mechanism  obtained  using  the  analytical  model. 


Keywords 

ATM  Networks,  connectionless  services,  traffic  management,  Markov  decision  processes 
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1  INTRODUCTION 


One  of  the  first  expected  applications  of  ATM  networks  is  LAN  interconnection.  Since 
ATM  is  a  connection-oriented  transfer  mode,  the  provision  of  connectionless  services 
(LANs  are  connectionless)  is  a  challenge  lor  ATM  networks.  ITU  recommendation  1211 
has  proposed  two  approaches  to  offer  a  connectionless  service  in  B-ISDN,  namely  the 
direct  and  indirect  methods.  This  paper  deals  with  the  first  one,  which  consists  of  in¬ 
troducing  ConnectionLess  Servers  (CLSs)  into  B-ISDN.  These  CLSs  are  interconnected 
through  ATM  connections  thus  forming  an  “overlay  network”.  The  functions  of  the  CLSs 
are  mainly  to  route  packets  (datagrams)  and  to  manage  connectionless  traffic  (Vickers, 
1994). 

Congestion  control  in  this  network  can  be  achieved  as  in  classical  datagram  networks 
by  means  of  mechanisms  combining  dynamic  routing  and  flow  control.  When  ATM  con¬ 
nections  are  used  instead  of  leased  lines,  a  third  traffic  management  appioach  can  be 
introduced  :  the  dynamic  bandwidth  allocation.  By  taking  advantage  of  the  flexibility  of 
ATM  connections,  it  may  allow  a  gain  in  bandwidth  utilization. 

Mechanisms  using  dynamic  bandwidth  allocation  have  been  proposed  in  previous  papers 
(Gallassi,  1992)  (Heijenk,  1992)(Mongiovi,  1991)),  but  mainly  for  the  indirect  approach 
(no  CLS).  In  (Gallassi,  1992),  it  is  argued  that  the  proposed  mechanism  works  well  in 
the  direct  case,  but  no  study  of  performance  is  given.  In  (Yamamoto,  1993),  a  mechanism 
combining  flow  control  and  dynamic  bandwidth  allocation  is  proposed  lor  both  direct  and 
indirect  approaches;  however,  in  the  direct  case,  Yamamoto  and  al.  propose  to  use  dynamic 
bandwidth  allocation  only  on  the  access  links.  They  argue  that  only  a  simple  feedback 
type  flow  control  mechanism  is  necessary  between  CLSs,  because  fluctuation  of  traffic  on 
such  links  is  small  due  to  the  high  degree  of  multiplexing  that  is  achieved.  Unfortunately, 
as  shown  in  (Van  den  Berg,  1995),  the  high  unpredictability  of  connectionless  traffic  makes 
the  effective  dimensioning  extremely  difficult  in  case  of  fixed  bandwidth  links. 


The  goal  of  this  paper  is,  on  one  hand,  to  show  that  a  non  negligible  gain  in  the  uti¬ 
lization  of  bandwidth  can  be  obtained  by  using  mechanisms  combining  flow  control  and 
dynamic  bandwidth  allocation  and,  on  the  other  hand,  to  determine  the  mechanisms  al¬ 
lowing  such  a  gain.  The  gain  is  studied  with  respect  to  the  situations  where  no  mechanism 
or  only  flow  control  is  used. 

The  structure  of  this  paper  is  as  follows.  In  Section  2,  a  simple  model  of  two  tandem 
queues  is  studied  in  order  to  gain  insight  into  the  type  of  mechanisms  that  optimize  the 
bandwidth  gain.  A  Markov  decision  process  approach  is  used  to  model  the  system.  In  Sec¬ 
tion  3,  a  more  realistic  network  is  studied  by  simulation  and  it  is  shown  that  the  proposed 
mechanism,  motivated  by  the  results  in  Section  2,  allows  a  gain  in  bandwidth  utilization 
while  assuring  some  QoS  constraints,  and  reduces  the  sensitivity  of  the  performances  of 
the  network  with  respect  to  the  traffic  characteristics.  We  conclude  in  Section  4. 
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2  TANDEM  QUEUES 


2.1  The  queueing  system 

The  system  we  study  consists  of  two  tandem  queues  with  finite  buffers  (see  Figure  1). 
Queue  1  may  represent  an  individual  source  or  a  Connectionless  Server  whereas  queue 
2,  which  is  the  queue  of  interest,  represents  a  CLS.  Packets  (datagrams)  enter  the  system 
following  an  Interrupted  Poisson  Process  (IPP).  The  choice  of  an  IPP  was  made  in  order 
to  capture  the  bursty  nature  of  connectionless  traffic.  A  more  realistic  arrival  process 
would  be  of  little  interest  for  this  study,  which  aims  to  obtain  qualitative  results  on  the 
gain  that  can  be  achieved  by  using  dynamic  bandwidth  allocation  when  traffic  may  be 
bursty.  Service  times  are  exponentially  distributed.  The  service  rates  vary  dynamically, 
representing  ,  respectively  the  flow  control  for  the  first  queue  and  the  dynamic  bandwidth 
allocation  for  the  second.  More  precisely,  a  controller  is  placed  at  queue  2  and  may  decide 
either  to  control  queue  1  from  /<2  (normal  rate)  to  /q  (controlled  rate,  with  /q  <  ^2), 
to  ask  for  an  increase  of  service  rate  for  queue  2  from  V\  (normal  rate)  to  l>n  (maximum 
rate),  or  to  leave  the  service  rates  at  their  normal  values  (/<2  and  ;q). 


IPP  (Q,  A)  /i  G  {/i i,/<2}  v  E  {iq, vn) 


D 


Figure  1  Tandem  controlled  queues. 


The  notations  are  the  following: 


•  Q  = 


< 7 
— r 


is  the  infinitesimal  generator  of  the  phase  of  the  IPP. 


This  means  that  q  (respectively  r)  is  the  rate  of  passage  from  silent  period  to  active 
period  (  respectively  from  active  period  to  silent  period)  of  the  arrival  process. 

A  is  the  intensity  of  the  arrival  process  when  active  (i.e.  the  peak  rate  of  the  arrival 
process). 

From  the  preceding  definitions,  we  observe  that  the  mean  rate  of  the  IPP  is  equal  to 
1 

9 


1  +  i 
r  '  q 


A  = 


q  +  r 


-A. 


•  h  is  the  activity  parameter  of  the  arrival  process,  which  we  define  as  the  ratio  mean 


rate/  peak  rate;  it  is  given  by  b  =  — . 

•  m,  i  €  {T'2}  are  the  possible  service  rates  of  queue  1. 

•  N  is  the  number  of  allocation  levels  for  the  server  of  queue  2  (i.e  N  is  the  number  of 
possible  service  rates  of  queue  2). 

•  Vj,  j  £  {1, N}  are  the  possible  service  rates  of  queue  2. 

•  A'i  (respectively  /v2)  is  the  capacity  of  queue  1  (respectively  queue  2). 
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We  denote  by  ( x,y,z )  a  state  where  x  is  the  number  of  packets  in  queue  1,  y  is  the 
number  of  packets  in  queue  2  and  z  the  phase  of  the  IPP  (0  if  it  is  silent,  1  it  it  is  active). 

Our  goal  is  to  find  an  efficient  and  simple  mechanism  combining  flow  control  and  band¬ 
width  allocation.  We  therefore  use  the  theory  of  Constrained  Markov  Decision  Processes 
(Puterman,  1994)  in  order  to  determine  an  optimal  mechanism  satisfying  some  QoS  con¬ 
straints.  The  optimality  is  defined  with  respect  to  a  cost  function,  which  is  an  increasing 
function  of  the  service  rate  of  queue  2. 

In  the  following  subsection,  the  studied  controlled  process  is  described. 

2.2  The  Markov  decision  process 

Considered  is  a  continuous  time  Markov  decision  process  with  the  following  characteristics: 

•  the  state  space  is  ,S'= [0,A’i ]x[0,  A { 0,1). 

•  the  action  space  is  /I  =  {(pi,Vj),i  G  { 1 , 2 } ,  j  G  {1, .., N}}. 

•  the  transition  rates  are  the  same  as  for  a  model  with  fixed  rates,  with  p  and  v  depending 
on  the  chosen  action.  More  precisely,  the  evolution  of  the  process  is  determined  by  a 
family  of  infinitesimal  generators  indexed  by  the  action  space. 

•  the  instantaneous  cost  function,  denoted  by  c[(x,y,z)\p,u],  which  is  given  below. 

The  evolution  of  the  process  is  as  follows.  At  each  transition  time  t  of  the  process,  a 
decision  is  taken,  that  is  a  pair  of  service  rates  is  chosen  in  A ,  say  (pt,vt).  The  process 
then  evolves  from  current  state  (Xt,Yt,Zt)  in  S  to  its  next  state  (Xt>  ,Yti,  Zf)  with  rates 
given  by  the  infinitesimal  generator  Q(pt,at).  Between  t  and  t',  a  cost  is  incurred  at  rate 
c(Xt,Yt ,  Zt\ pt,  vt).  Once  the  transition  into  the  next  state  has  occurred,  a  new  decision  is 
taken. 

The  objective  is  to  find  a  policy  (i.e.  a  sequence  of  actions)  tt  minimizing  the  average 
cost  ,  i.e.  the  following  function  (Puterman,  1994): 

Vffx,y,z)  =  limsup  Uj=Qc(XuYt,Zt\puVt)dt\, 

under  the  constraint: 

limsup  TElx’y’z)  {/(LoUlAWAl  +  1{Y,=ka)(U}  ^  «• 

Here  lx  is  the  indicator  function  of  the  random  variable  X  and  j means  the  ex¬ 
pectation  with  respect  to  the  probability  induced  by  policy  7r  and  initial  state  (x,  y,  z )  (see 
(Puterman,  1994),  chapter  2,  for  a  rigorous  presentation  of  the  probabilistic  framework 
for  Markov  decision  processes).  The  considered  constraint  imposes  that  the  probability  of 
saturation  (i.e  the  probability  that  one  of  the  buffers  is  full)  does  not  exceed  a.  Under  ir- 
reducibility  assumptions  one  knows  (Puterman,  1994)  that  the  optimal  average  cost  does 
not  depend  on  the  initial  conditions;  furthermore,  there  exist  several  algorithms  allowing 
to  compute  the  optimal  cost  and  policies. 

The  computation  ol  an  optimal  policy  requires  first  to  transform  the  initial  problem 
into  a  discrete-time  one;  to  that  purpose  We  use  the  uniformization  technique  (Serfozo, 
1979)  to  transform  the  initial  problem  into  a  discrete-time  one.  This  approach,  which  is 
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very  common  in  Markov  decision  process  theory,  allows  to  obtain  a  discrete  time  Markov 
decision  process  which  is  equivalent  to  the  initial  one  in  the  sense  that  every  optimal 
policy  for  one  problem  is  optimal  for  the  other. 

We  then  use  the  classical  formulation  ol  Markov  decision  processes  using  Linear  Pro¬ 
gramming  (Derman,  1970).  This  formulation,  which  allows  us  to  easily  take  into  con¬ 
sideration  the  constraint  on  the  probability  of  saturation,  leads  to  the  following  linear 
programming  problem: 

2  N 

minimize  Z  Z  ct(x>  2A  z)i  /*«,  vf\  £[(^7  2b -)i  A,]- 

*=1  j=l(x-,y,2)eS 

subject  to  the  following  constraints: 


£[(z, 2/,  2); Pi,  Vj]  >  0,  V(.r ,y,z)eS,  Vt  =  1, 2  Vj  =  1, .., N, 


!=1  ]=l(x,y, z)eS 

Z  Z  £l( =  E  E  £[(*>& Z  P[{x,y,z')\(x,y,z-,iJ.i,i'j)], 

1=1  j  =  l  1  =  1  J  =  1  (x-',y'  T  =  ')es 


Z  Z  Z  [!{*=*!}  +  1fy=A-2>]  /*.■>  "A  < 

i=l  j=l(x,j/,r)eS 

where  P  is  the  probability  matrix  obtained  after  uniformizing  the  process. 
In 'the  following  subsection,  the  obtained  results  are  presented. 


2.3  Results 

The  objective  of  this  study  is  to  examine  the  impact  of  dynamically  allocating  bandwidth. 
To  that  purpose,  a  first,  system  (system  FC  -  Flow  Control),  where  only  one  service  rate 
is  available  for  queue  2  (i.e.  N=l)  is  compared  with  a  second  one  (system  FCDA  -  Flow 
Control  and  Dynamic  Allocation),  where  the  service  rate  of  queue  2  may  take  several 
values  (N  >1). 

For  system  FC,  we  look  for  the  minimum  service  rate  u  for  which  the  constraint  can  be 
satisfied  (this  means,  for  which  there  exists  a  policy  such  that  the  constiaint  is  satisfied). 
Linear  programming  informs  indeed  about  the  feasibility  of  the  constiaint. 

For  system  FCDA,  the  service  rate  of  queue  2  is  allowed  to  vary  in  a  wide  range  (see 
next  paragraph)  so  that  feasibility  is  always  achieved. 

Choice  of  parameters: 

Here  are  the  values  chosen  for  the  parameters  of  the  model.  Although  they  do  not  reflect 
realistic  figures,  mainly  because  of  the  size  of  the  state  space  when  considering  laige 
buffers,  reasonable  proportions  were  kept  between  the  different  values.  The  qualitative 
results'that  we  obtained  in  all  the  experiments  we  carried  out  were  quite  similar. 


•  The  capacities  of  queue  1  and  2  are  A 1  —  15  and  A 2  —  25. 
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•  The  following  values  are  considered  for  the  possible  service  rates  of  queue  1  and  queue 

2: 

Pi  =  5,  p2  =  200. 

N  =  5  and  U\  —  5,  v2  =  50,  n3  =  100, iq  =  150,  u 5  =  200. 

•  The  characteristics  of  the  arrival  process  vary  throughout  our  study  as  follows:  arrival 
processes  with  different  activity  parameter  (from  0.025  to  0.5)  and  mean  burst  duration 
(5  and  10  packets)  values  are  considered. 

The  whole  set  of  parameters  of  the  IP?  is  set  by  imposing  a  constant  mean  rate  equal 
to  5. 

•  the  target  saturation  probability  a  is  equal  to  10~3. 

•  The  considered  instantaneous  cost  function  is  c[(x,y,z)]  n,  v]  =  v2 .  The  choice  of  a 
function  of  u  increasing  faster  than  linearly  is  made  in  order  to  reflect  the  following  fact. 
When  dealing  with  dynamically  allocated  bandwidth  for  bursty  traffic,  a  request  for  a 
large  amount  of  bandwidth  costs  more  to  the  network  operator  than  several  requests  for 
smaller  amounts.  This  is  because  the  multiplexing  gain  that  can  be  achieved  strongly 
decreases  with  the  peak  rate  of  the  considered  bursty  traffic  (Roberts,  1991). 

The  square  root  of  the  average  cost,  which  is  denoted  by  “normalized  cost”,  is  taken 
as  the  measure  of  the  required  bandwidth. 

Influence  of  the  traffic  characteristics  on  optimal  cost 

Figure  2  shows  the  normalized  cost  obtained  for  the  two  mechanisms  as  a  function  of  the 
activity  parameter  of  the  arrival  process.  The  left-hand  figure  is  obtained  for  bursts  with 
mean  length  5  packets  and  the  right-hand  one  for  bursts  with  mean  length  10  packets. 

burst  size  5  burst  size  10 


Figure  2  Comparison  ot  FC  and  FCDA  mechanisms. 

It  can  be  observed  that,  in  both  cases,  there  is  a  gain  in  cost  (that  is  in  utilization 
of  bandwidth)  when  dynamic  bandwidth  allocation  is  used.  This  gain  is  an  increasing 
function  of  the  activity  parameter  (for  bursts  of  size  5,  this  gain  ranges  from  14.5%  for 
activity  parameter  0.5  to  17.8%  for  activity  parameter  0.025;  for  bursts  of  size  10,  it 
ranges  from  7%  for  activity  parameter  0.5  to  60%,  for  activity  parameter  0.025).  The  gain 
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in  bandwidth  utilization  appears  to  be  considerable  if  the  activity  parameter  is  low  (i.e. 
peak  rate  is  much  larger  than  mean  rate)  on  the  considered  links.  This  allows  to  hope  for 
a  substantial  gain  in  the  utilization  of  bandwidth  in  CLS  network  where  it  is  not  possible 
to  assert  that  activity  parameters  will  be  high. 

It  can  also  be  concluded  that  the  obtained  mechanism  highly  reduces  the  sensitivity  of 
the  performances  ol  the  network  with  respect  to  the  traffic  characteristics.  This  simplifies 
the  dimensioning  of  the  resources  of  the  network. 

Structure  of  optimal  policies 

From  the  preceding  paragraph,  it  can  be  concluded  that,  dynamic  bandwidth  allocation 
seems  to  lead  to  a  substantial  gain  in  bandwidth  utilization,  In  this  paragraph,  we  aim 
to  gain  insight  about  the  design  of  efficient  mechanisms  .  To  that  purpose,  the  structure 
of  the  optimal  policies  obtained  using  linear  programming  is  studied.  Figures  3  and  4 
represent  the  optimal  actions  as  a  function  of  the  occupancies  of  buffer  1  and  2,  when 
the  arrival  process  is  active  (-2=1).  The  burst  size  is  equal  to  8  packets  and  the  activity 
parameter  varies  from  0.1  to  0.025. 


Flow  Control  +  Rate  Allocation  Flow  Control  +  Rate  Allocation 


occupancy  of  buffer  1  occupancy  of  buffer  1 


Figure  3  Optimal  FCDA  mechanism  (left:  activity  parameter  0.1,  right:  activity  param¬ 
eter  0.05). 

It  has  to  be  noticed  that,  the  larger  the  peak  rate  of  arrival  process  compared  to  the 
mean  rate,  the  more  the  obtained  optimal  mechanism  relies  on  rate  allocation  and  the 
less  on  flow  control. 

The  considerable  gain  that  can  be  achieved  using  the  optimal  policy  incites  to  the 
analysis  of  a  more  realistic  model  of  a  CLS  network  using  flow  control  and  dynamic 
bandwidth  allocation.  We  focus  on  the  case  of  low  activity  parameter  and  so  we  analyze 
a  traffic  management  mechanism  motivated  by  the  right  component  of  Figure  4.  The 
optimal  mechanism  represented  in  this  figure  is  approximated  by  a  simpler  one  of  the 
type  represented  in  Figure  5. 

The  proposed  traffic  management  mechanism  is  described  in  detail  in  the  following 
section. 
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Flow  Control  +  Rate  Allocation 


Flow  Control  +  Rate  Allocation 


5.0  10.0  15.0 

occupancy  of  buffer  1  occupancy  of  buffer  1 

Figure  4  Optimal  FCDA  mechanism  (left:  activity  parameter  0.033,  right:  activity  pa¬ 
rameter  0.025). 


rt  io.o 

Q- 


20.0 


Flow  Control  +  Rate  Allocation 


Figure  5  Approximated  FCDA  mechanism. 


3  SIMULATION  STUDY 

3.1  The  simulation  model 

The  considered  model  consists  of  a  network  of  CLSs  fed  by  bursty  sources  (see  Figure 
6).  Packets  of  constant  size  (equal  to  1.5  Kbyte)  arrive,  following  an  Interrupted  Poisson 
Process,  to  the  buffer  of  a  source  before  being  transmitted  to  the  network.  A  very  simple 
topology  is  considered  for  the  CLS  network  :  two  CLSs  (CLSs  B  and  C)  transmit  packets 
to  a  third  one  (CLS  A).  There  is  a  non  negligible  propagation  time  between  sources  and 
CLSs  and  between  CLSs.  This  propagation  time  also  affects  information  corresponding  to 
the  flow  control  and  bandwidth  allocation  procedures.  The  capacity  of  all  CLSs  buffers 
is  equal  to  1000  packets  (1.5  Mbytes).  These  buffers  are  supposed  to  be  dedicated  to  an 
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output  link  of  the  corresponding  CLS.  Different  cases  will  be  analyzed  for  the  sources 
buffer  size,  as  described  later. 


SOURCES 

arrivals  _ _ 
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bandwidth 

allocation 


_  flow  control  ^  ^  ATM 
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CLS  C-O 
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\  How  control 
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CLS  A-Q 
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/  flow  control 
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CLS  B 


flow  control 


Figure  6  The  simulation  model. 


The  proposed  Flow  Control  mechanism  is  the  following.  When  the  occupancy  of  the 
buffer  of  a  CLS  reaches  the  level  Tctri,  this  CLS  asks  for  a  reduction  of  the  transmission 
rates  on  its  input  links.  Each  CLS  measures  the  arrival  rate  in  its  buffer  and  reduces 
its  input  rate  so  as  to  obtain  a  fixed  ratio  controlled  input  rate/output  rate  (  which  is 
denoted  by  p).  This  control  is  exercised  until  the  occupancy  ol  the  buffer  ol  the  congested 
CLS  passes  below  the  level  Tr ise  (Trise  <  Tctr\). 

The  proposed  Dynamic  Allocation  mechanism  is  the  following.  When  the  occupancy 
of  the  buffer  of  a  CLS  reaches  the  level  5ailoc  <  Tctri,  this  CLS  asks  for  an  increase  of  the 
rate  on  its  output  link.  Depending  on  the  situation  ol  ATM  links,  this  requests  can  be 
accepted  or  not.  Each  time  it  is  accepted,  the  rate  of  the  output  link  is  increased  by  a 
fix  amount  P  Mbit/s.  The  congested  CLS  waits  some  time  F  (for  filter)  before  requesting 
again  for  more  bandwidth  if  still  congested.  Congestion  is  considered  to  be  over  as  soon 
as  the  occupancy  of  the  CLSs  buffer  passes  below  the  level  Sr\se  (St ise  <  5'aUoc)-  In  this 
case,  the  rate  of  the  output  link  takes  its  initial  value  (which  corresponds  to  the  minimum 
of  the  possibles  values  of  this  rate),  d  he  probability  ol  refusals  (-//.efusai)  is  an  increasing 
function  of  the  already  allocated  bandwidth.  More  precisely,  it  is  supposed  to  be  of  the 
following  form:  Prerusa]  =  min(((i<v  -  n,)/P)Pf,  1),  where  v,  is  the  initial  rate  and  vr  is  the 
requested  rate. 

If  an  increase  is  refused,  the  CLS  waits  some  time  W  before  requesting  again  for  band¬ 
width  if  still  congested. 
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Choice  of  parameters 

Here  is  the  list  of  the  chosen  parameters  lor  the  simulation  study.  Since  some  parameters, 
especially  concerning  the  arrival  processes  into  the  sources,  are  going  to  change  duiing 
our  experiments,  the  given  values  are  valid  unless  otherwise  stated. 

•  number  of  sources  :  80 

•  mean  rate  of  sources  :  1  Mbit/s 

•  peak  rate  of  sources  :  100  Mbit/s 

•  mean  burst  duration  :  200  ms 

•  Pf  =  0.1,  P  -  30  Mbit/s,  W  =  70  ms,  F  =  20  ms,  p  =  0.8. 

•  Saiioc  =  600  packets,  S',.ise  =  500  packets 

•  Tctri  =  900  packets,  T, |se=  850  packets 

A  few  remarks  have  to  be  made  about  the  choice  of  these  thresholds.  The  dimensioning  of 
rctri  is  mainly  linked  to  the  distance  between  sources  and  CLSs  or  between  CLSs,  i.e.  to 
the  time  between  congestion  notification  and  effective  decreasing  of  the  rate.  In  principle, 
one  could  choose  this  threshold  so  that,  even  if  all  input  links  of  the  congested  CLS  are 
“active”,  i.e.  emitting  packets,  all  packets  emitted  before  the  reaction  of  the  sources  are 
absorbed  and  so  there  are  no  lost  packets.  This  would  require  too  large  capacities  for  the 
buffers  of  the  CLSs  or  a  poor  utilization  of  these  buffers  most  of  the  time.  The  choice  of 
900  corresponds  to  a  relatively  reasonable  compromise  between  utilization  of  the  buffer 
and  efficiency  of  the  control  mechanism. 

There  is  an  alternative  in  the  choice  of  Tr ise:  if  it  is  too  close  from  Tctri,  this  will  lead 
to  numerous  oscillations  between  control  and  release  ol  the  control*  ;  if  it  is  too  far,  this 
will  impose  very  long  periods  of  control  to  the  controlled  sources  or  CLSs. 

The  choice  of  5aiioc  and  5rlse  conditions  the  frequency  of  allocation  requests,  so  that  the 
alternative  is  between  efficiency  and  frequency  of  allocations.  S/ise  is  not  1°  be  chosen  too 
low  for  another  reason,  which  is  the  utilization  of  the  allocated  bandwidth:  if  Sr\se  is  too 
low,  there  is  a  risk  that  the  buffer  will  empty  between  the  instant  when  the  decision  of 
decreasing  the  output  rate  is  taken  and  the  instant  when  this  rate  is  actually  decreased.  In 
this  case,  the  utilization  of  the  supplementary  allocated  bandwidth  will  not  be  maximized. 

3.2  Simulation  results 

In  this  section  it  is  shown,  by  means  of  numerical  results,  that  the  global  traffic  manage¬ 
ment  mechanism  we  propose  allows  a  gain  in  the  utilization  of  bandwidth. 

The  simulations  were  realized  with  Simscript  II. 5. 

Influence  of  the  characteristics  of  input  traffic 

We  first  compare  the  system  without  any  control  (curves  denoted  NM,  for  No  Mechanism), 
with  traffic  control  (curves  denoted  FC,  for  Flow  Control)  and  with  traffic  control  and 
dynamic  bandwidth  allocation  (curves  denoted  FCDA,  for  Flow  Control  and  Dynamic 
Allocation).  The  systems  without  any  control,  with  control  and  with  control  and  dynamic 


*This  generates  a  non-negligible  traffic  of  control  messages. 
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bandwidth  allocation  will  also  be  denoted,  respectively,  by  NM,  FC  and  FCDA  .  In  the 
first  experiments,  the  buffers  of  the  sources  are  supposed  to  have  an  infinite  capacity. 

In  all  the  cases  the  “required  bandwidth”  is  defined  as  the  minimum  bandwidth  neces¬ 
sary  to  guarantee  a  quality  of  service  defined  by  a  loss  probability  of  lCT4  and  a  maximum 
delay  less  than  30ms  for  95%  of  the  packets  (here  the  delay  is  the  time  a  packet  spends 
in  the  system  from  the  instant  it  enters  a  source  to  the  instant  it  leaves  CLS  A). 

Figure  7  shows  the  bandwidth  required  for  the  output  links  of  CLSs  B  and  C  as  a  function 
of  the  mean  burst  duration  (the  mean  silence  duration  is  varied  in  the  same  proportion  as 
the  mean  burst  duration  in  order  to  maintain  the  mean  arrival  rate).  For  system  FCDA, 
this  bandwidth  is  by  definition  the  weighted  average  of  the  different  rates  which  are  used, 
with  weights  being  given  by  the  proportion  of  time  each  rate  is  used.  The  main  conclusions 
are  the  following: 

•  the  FCDA  mechanism  allows  a  saving  which,  for  mean  burst  duration  of  400ms,  is  of 
39%  compared  with  system  NM  and  of  9%  compared  with  system  FC. 

•  the  gain  of  the  FCDA  mechanism  with  respect  to  FC  mechanism  is  an  increasing 
function  of  the  mean  burst  duration.  Since  the  maximum  burst  duration  considered 
corresponds  to  a  burst  length  of  4.8  Mbytes,  some  greater  values  of  this  parameter, 
which  are  not  unrealistic  in  the  context  of  LAN  interconnection,  could  lead  to  a  more 
substantial  gain. 

•  The  slope  of  the  curve  corresponding  to  the  FCDA  mechanism  is  lower  than  for  the 
other  two  mechanisms,  which  reflects  a  lower  sensitivity  of  this  mechanism  to  burst 
length,  which  is  a  difficult  parameter  to  predict. 


Figure  7  Influence  of  mean  burst  duration. 

Figure  8  shows  the  influence  of  the  activity  parameter  of  the  arrival  process  on  the  required 
bandwidth.  When  varying  the  activity  parameter,  we  also  vary  the  number  of  sources  so 
that  the  total  mean  rate  of  arrivals  keeps  constant.  The  goal  here  is  to  find  out  if  the 
network  operator  may  be  able,  once  the  allocated  bandwidth,  to  use  it  efficiently  whatever 
the  profile  of  the  connectionless  traffic.  Here  the  mean  burst  duration  is  constant  and 
equals  200  ms  and  we  vary  the  silence  intervals  duration. 
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The  main  conclusion  which  can  be  drawn  from  this  figure  is  that  the  FC  and  FCDA 
mechanisms  seem  to  be  almost  insensitive  to  the  activity  parameter  of  the  sources.  This  is  a 
very  interesting  feature  since,  if  the  activity  parameter  of  typical  sources  for  connectionless 
services  is  expected  to  be  very  low,  it  is  also  expected  to  vary  a  lot  from  one  type  of  source 
to  another. 


Figure  8  Influence  of  the  activity  parameter. 


Even  if,  from  the  two  preceding  experiments,  some  interesting  features  of  the  FCDA 
mechanism  have  been  highlighted,  its  gain  with  respect  to  FC  mechanism  has  not  revealed 
considerable  with  respect  to  the  FC  mechanism. 

However,  until  now,  these  mechanisms  were  only  compared  in  a  context  where,  because 
of  the  choice  of  the  rate  of  the  output  link  of  CLS  A,  not  many  control  requests  were 
imposed  to  CLS  B  and  C.  These  conditions  are  now  changed  by  adding  a  background 
source  which  feeds  CLS  A.  The  characteristics  of  this  source  are  a  peak  rate  of  120 
Mbit/s  and  an  activity  parameter  of  0.1.  This  of  course  increases  the  load  of  CLS  A. 

Figure  9  shows  the  obtained  results.  Only  FC  and  FCDA  mechanisms  are  compared, 
but  two  cases  are  considered  for  FCDA  mechanism,  depending  on  the  assumed  probability 
of  refusals  of  allocation  requests.  FCDA  1  corresponds  to  the  same  case  as  before,  i.e. 
Pf  =  0.1.  FCDA  2  corresponds  to  Pj  =  0.05. 

The  required  bandwidth  is  of  course  greater  than  in  the  previous  situation,  where  no 
external  traffic  was  imposed  to  CLS  A.  It  can  also  be  observed  that  the  gain  obtained 
for  the  FCDA  mechanism  with  respect  to  the  FC  mechanism  becomes  significant  in  these 
conditions.  It  is  of  15  %  for  the  same  conditions  of  refusals  as  before  and  even  of  25  %  in 
the  case  of  lower  probability  of  refusals.  This  probability  is  difficult  to  characterize,  since 
it  corresponds  to  the  capability  of  ATM  network,  in  view  of  its  load,  to  allocate  more 
bandwidth  to  the  CLSs  output  links.  This  is  mainly  related  to  the  CAC  function  used  by 
the  network.  However  the  cases  investigated  here  do  not  seem  to  us  exceedingly  optimistic 
;  this  allows  to  hope  that  in  a  real  case  the  proposed  mechanism  will  induce  even  better 
performances. 
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Figure  9  Influence  of  load  on  C'LS  A. 


Influence  of  the  distance 

Until  now,  we  have  only  studied  the  case  where  the  buffers  of  the  sources  were  supposed  to 
have  an  infinite  capacity.  In  this  paragraph  we  remove  this  restriction  and,  in  particular, 
we  analyze  the  influence  of  the  network  size  on  the  required  buffer  sources.  Indeed,  it  is 
desirable  that  these  buffer  requirements  are  less  sensitive  with  respect  to  the  network  size. 

Figure  10  shows  the  influence  between  sources  and  CLS  and  between  CLSs  in  the 
case  where  the  bandwidth  in  the  FC  case  and  the  initial  bandwidth  in  the  FCDA  case 
correspond  to  the  minimum  values  required  to  guarantee  the  required  QoS.  The  sources 
have  100  ms  mean  burst  sizes  and  the  global  load  remains  as  in  the  previous  sections. 


Figure  10  Influence  of  distance  on  sources  buffers. 

We  conclude  that  the  proposed  mechanism  allows  the  use  of  smaller  buffers  for  the 
sources  and  reduced  the  sensitivity  of  these  buffer  sizes  with  respect  to  the  size  of  the 
network. 
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Transient  behaviour 

In  order  to  better  understand  the  impact  of  the  mechanism,  we  show  in  the  next  figures  a 
typical  transient  behaviour  of  the  FC  and  FCDA  mechanisms  in  a  situation  of  congestion. 
Figure  1 1  shows  the  evolution  of  the  occupancy  of  the  buffer  of  CLS  B  during  a  typical 
period  of  congestion.  At  the  beginning  of  the  represented  period,  the  CLS  has  made  a 
request  for  bandwidth  allocation  which  has  been  accepted.  The  output  link  rate  becomes 
255  Mbit/st  instead  of  the  former  225  Mbit/s.  As  we  see  on  this  figure,  this  allocation 
does  not  prevent  the  buffer  occupancy  from  reaching  the  threshold  Tclr i  (900  packets). 
However,  this  allocation  allows  a  shorter  period  of  control  of  the  sources.  Moreover,  we 
observe  that,  when  using  the  FC  mechanism,  the  period  ol  control  is  followed  by  another 
one.  This  is  because  of  the  release  ol  controlled  sources,  which  have  accumulated  a  lot  of 
packets  while  being  controlled.  This  second  period  of  control  leads  to  the  saturation  of 
two  sources  buffers  and  to  losses  in  one  of  these  buffers  (See  Figure  13). 


Figure  11  Buffer  occupancy  of  CLS  B. 


FC  mechanism 
FCDA  mechanism 


This  phenomenon  is  avoided  in  the  FCDA  for  one  main  reason.  This  is  that,  when 
sources  are  released,  the  CLS  disposes  of  a  larger  output  rate,  thus  avoiding  a  second 
period  of  “high  congestion”.  Other  reasons  are,  first,  that,  as  we  underlined  it,  the  period 
of  control  is  slightly  shorter  with  this  mechanism,  so  that  sources  are  less  stressed  than 
with  the  FC  mechanism;  second,  observe  on  Figure  12  that,  even  when  using  flow  control, 
the  FCDA  mechanism  reduces  less  the  rate  of  sources  than  the  FC  mechanism.  This  is 
because  the  coefficient  of  reduction  of  the  rate  of  sources  is  function  of  the  output  rate 
of  the  congested  CLS  (since  the  objective  of  the  flow  control  mechanisms  we  study  is  to 
reach  a  target  load  p).  Note  the  slight  delay  (corresponding  to  the  propagation  delay) 
between  the  time  of  the  control  decision  (tj )  and  the  time  of  the  effective  reduction  of  the 
rate  of  the  sources  (t2)  (See  Figures  11  and  12). 

The  direct  effects  of  the  flow  control  on  packet  losses  in  sources  is  made  clear  in  Figure 
13.  On  this  figure,  the  occupancy  of  the  buffers  of  three  of  the  sources  is  represented, 

tWe  suppose  ATM  interfaces  at  622  Mbit/s.  Let  us  recall  here  that  only  155  Mbit/s  and  622  Mbit/s 
interfaces  have  been  defined  by  the  ITU. 
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Figure  12  Source  rates. 


during  the  same  period  as  on  the  above  figures.  T.  here  is  no  loss  in  the  FCDA  mechanism 
whereas  the  two  periods  of  control  cause  a  progressive  saturation  of  two  of  the  sources 
(one  of  which  remains  saturated  even  after  the  observed  period)  and  cause  losses  in  one. 


FC  mechanism 


FCDA  mechanism 
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Figure  13  Source  buffers  (FC  and  FCDA). 


4  CONCLUSIONS 

We  propose  a  traffic  management  mechanism  combining  flow  control  and  dynamic  band¬ 
width  allocation  in  connectionless  networks  using  an  ATM  network  infrastructure.  The 
mechanism  allows  a  gain  in  the  utilization  of  the  capacity  of  the  ATM  network  and  re¬ 
duces  the  sensitivity  of  the  performances  of  the  connectionless  network  with  respect  to 
the  traffic  characteristics.  The  next  steps  of  our  study  are  as  follows.  First,  we  are  taking 
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into  account  the  usage  parameter  control  that  has  been  standardized  by  the  ETSI  for  the 
CBDS  (Connectionless  Broadband  Data.  Service)  service.  In  such  a  case,  the  backpressure 
flow  control  to  the  sources  used  in  the  present  study  should  not  constrain  the  input  traffic 
under  the  negotiated  traffic  contract.  Second,  we  are  analyzing  the  performances  of  a 
connectionless  network  on  top  of  an  ATM  network  offering  the  ABR  transfer  capability  in 
order  to  compare  the  results  with  the  performances  of  the  system  analyzed  in  the  present 
paper. 
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Abstract 


ATM  networks  support  a  wide  range  of  multimedia  traffic.  Various  BISDN  VBR  sources 
generate  traffic  at  significantly  different  rates.  The  traffic  can  often  have  time-varying 
characteristics  which  are  not  well  understood  currently.  However,  traffic  management 
techniques  require  traffic  parameters  that  can  capture  the  various  traffic  characteristics 
and  adapt  to  the  changing  network  environment.  In  this  paper,  we  present  a  novel  neural 
network  approach  to  characterize  and  predict  the  complex  arrival  process.  The  FIR 
multilayer  perceptron  model  and  its  training  algorithm  are  discussed  in  this  paper.  It 
is  shown  that  the  FIR  neural  network  can  adaptively  predict  the  traffic  by  learning  the 
relationship  between  the  past  and  the  future  traffic  variations.  Based  on  the  experimental 
results,  we  conclude  that  the  FIR  neural  network  is  an  attractive  tool  for  traffic  prediction 
and  hence  has  an  excellent  potential  for  use  in  some  congestion  control  schemes. 

Keywords 

ATM,  traffic  prediction,  FIR  neural  networks 


1  Introduction 

Asynchronous  Transfer  Mode(ATM)  has  been  recommended  by  CCITT  as  the  transfer 
mode  for  the  future  broadband  ISDN(BISDN).  ATM  networks  are  expected  to  support 
a  diverse  set  of  applications,  such  as  data,  voice  and  video,  each  having  different  traffic 
characteristics.  Accurate  characterization  of  the  multimedia  traffic  is  essential  in  order 
to  develop  a  robust  set  of  traffic  descriptors.  Such  a  set  is  required  by  the  Usage  Pa¬ 
rameter  Control(UPC)  algorithm  for  traffic  enforcement  and  the  Connection  Admission 
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Control(CAC)  algorithm  for  bandwidth  allocation  utilizing  the  statistical  multiplexing 
gain.  However,  for  the  time  being,  there  are  no  comprehensive  measurements  that  permit 
designers  to  satisfactorily  address  the  characteristics  of  various  communication  services  in 
a  realistically  accurate  manner.  This  is  especially  true  for  Variable  Bit  Rate(VBR)  traffic. 

During  the  duration  of  a  connection,  the  period  at  which  a  source  generates  traffic  is 
referred  to  as  an  active  period,  whereas  a  silent  period  corresponds  to  the  time  between 
the  active  periods  during  which  no  traffic  is  generated.  Traffic  generated  by  a  VBR  source 
either  alternates  between  the  active  and  silent  periods,  or  is  a  continuous  bit  stream  with 
varying  rates.  This  traffic  is  highly  bursty  and  correlated(in  comparison  to  a  Poisson 
process).  Burstiness  can  be  defined  by  the  ratio  of  the  peak  bit  rate  to  average  bit  rate  or 
the  squared  coefficient  of  variation  of  the  interarrival  times  of  cells,  covariance  divided  by 
the  square  of  the  mean).  For  example,  c\  for  the  packet  arrival  process  from  a  single  voice 
source  is  18.1,  while  c\  for  a  Poisson  process  is  1  [Sriram  86,  Heffes  86].  Although  the 
aggregate  packet  arrival  process  with  many  components  does  behave  like  a  Poisson  process 
over  relatively  short  time  intervals,  under  heavy  loads  the  congestion  in  the  multiplexer 
is  determined  by  the  behaviour  of  the  arrival  over  much  longer  time  intervals,  where  it 
does  not  behave  like  a  Poisson  process.  Accordingly,  characterization  of  traffic  from  VBR 
sources  is  very  difficult. 

As  mentioned  above,  the  congestion  control  schemes(e.g.,  CAC  and  UPC)  in  ATM  net¬ 
works  require  specific  knowledge  of  the  statistical  behaviour  of  the  input  traffic  declared 
via  its  traffic  descriptors.  Parameters  such  as  peak  bit  rate,  average  bit  rate,  and  burst 
length  are  often  used  as  a  simple  set  of  parameters  characterizing  the  traffic.  More  com¬ 
plicated  second-order  time  domain  parameters(e.g.,  IDI,  IDC)  are  also  used  to  capture  the 
burstiness  and  the  correlation  properties  of  the  arrival  stochastic  process  especially  those 
of  VBR  video  and  voice  sources  [Habib  92].  In  [Heffes  86],  the  aggregate  arrival  process 
from  N  voice  sources  is  approximated  by  a  nonrenewal  process,  i.e.,a  two-state  Markov 
Modulated  Poisson  Process(MMPP).  In  [Daigle  86],  very  complex  mathematical  models 
such  as  semi-Markov  process  and  continuous-time  Markov  chain  are  used  to  characterize 
the  voice  traffic.  Traffic  descriptors  using  simple  parameters  will  not  accurately  charac¬ 
terize  very  rapid  changes  in  the  bit  rate  time  variations  of  the  traffic  over  short  intervals 
and  often  ignore  the  bursty  nature  of  the  traffic.  On  the  other  hand,  those  mechanisms 
using  more  sophisticated  parameters  are  computationally  expensive  and  impractical. 

To  solve  this  problem,  a  neural  network  based  traffic  prediction  approach  is  proposed  in 
this  paper.  The  neural  network  can  predict  the  bit  rate  variations  of  a  complex  stochastic 
process  and  capture  the  probability  density  function(pdf)  of  the  traffic.  It  is  shown  that 
the  neural  prediction  is  accurate  enough  to  characterize  the  actual  traffic  and  can  be  used 
in  policing  and  CAC  functions.  It  can  also  be  used  in  some  feedback  control  schemes. 
It  has  been  argued  that  traditional  reactive  congestion  control  is  not  suitable  for  ATM 
networks  due  to  the  effects  of  high-speed  channels.  Recently,  Amenyo  et  al.  [Amenyo  91] 
have  proposed  a  new  congestion  control  scheme  called  proactive  control.  Underlying  its 
feasibility  and  effectiveness  are  traffic  predictions  of  correlated  input  traffic  streams  into 
network  nodes.  These  predictions  are  used  to  obviate  the  problem  of  propagation  delays. 
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Output 

y(n)=x(n) 


Figure  1:  Static  multilayer  perceptron  used  as  a  nonlinear  predictor. 


So  we  can  apply  our  neural  prediction  method  to  this  framework  as  well. 


2  FIR  Neural  Network 

It  is  well  known  that  neural  networks  are  capable  of  performing  nonlinear  mappings  be¬ 
tween  real-valued  inputs  and  outputs.  A  three-layered  feedforward  neural  network  (  mul¬ 
tilayer  perceptron),  with  sigmoidal  units  in  the  hidden  layer,  is  able  to  approximate  an 
arbitrary  nonlinear  function  to  any  desired  degree  of  accuracy  [Funahashi  89,  Hornik  89]. 
This  kind  of  neural  network  is  trained  with  the  backpropagation(BP)  algorithm.  One  lim¬ 
itation  of  the  standard  BP  algorithm  is  that  it  can  only  learn  an  input-output  mapping 
that  is  static.  This  form  of  static  input-output  mapping  is  well  suited  for  pattern  recog¬ 
nition  applications,  where  both  the  input  and  output  vectors  represent  spatial  patterns 
that  are  independent  of  time  [Haykin  94]. 

The  standard  BP  algorithm  may  also  be  used  to  perform  nonlinear  prediction  on  a  sta¬ 
tionary  time  series  [Lapedes  87].  We  may  use  a  static  multilayer  perceptron,  as  depicted 
in  Figure  1,  where  the  input  elements  labeled  represent  unit  delays.  The  input  vector 
x  is  defined  in  terms  of  the  past  samples  x(n  —  1),  x(n  —  2), ...,  x(n  -  q)  as  follows: 

x  =  [x(n  -  l),x(n  -  2), ...,  x{n  -  q)]T  (1) 

where  q  is  the  prediction  order.  Thus  the  scalar  output  y(n)  of  the  multilayer  perceptron 
equals  the  one-step  prediction  x(n),  as  shown  by 

y(n)  =  x(n)  (2) 

The  actual  value  x(n)  of  the  input  signal  represents  the  desired  response. 

However,  if  we  want  to  capture  the  dynamic  properties  of  the  time-varying  signals,  we 
have  to  extend  the  design  of  a  multilayer  perceptron  so  as  to  represent  time  in  it.  One  of 
the  methods  is  the  so-called  Time  Delay  Neural  Network(TDNN),  which  was  first  used  in 
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[Lang  88]  to  perform  speech  recognition.  The  TDNN  is  a  multilayer  feedforward  network 
in  which  the  ouputs  of  a  layer  are  buffered  several  time  steps  and  then  fed  fully  connected 
to  the  next  layer.  It  was  devised  to  capture  explicitly  the  concept  of  time  symmetry  as 
encountered  in  the  recognition  of  an  isolated  phoneme  using  a  spectrogram. 

The  TDNN  topology  is  in  fact  embodied  in  a  multilayer  perceptron  in  which  each  synapse 
is  represented  by  a  Finite  Impulse  Response  (FIR)  filter.  This  latter  neural  network  is 
referred  to  as  a  FIR  multilayer  perceptron,  which  can  be  trained  with  an  efficient  algorithm 
called  temporal  backpropagation  [Wan  94].  It  can  be  shown  that  the  TDNN  and  the  FIR 
network  are  functionally  equivalent.  However,  the  FIR  network  is  more  easily  related 
to  a  standard  multilayer  network  as  a  simple  temporal  or  vector  extension.  The  FIR 
representation  also  leads  to  a  more  desirable  adaptation  scheme.  So  in  this  paper,  we 
adopt  this  kind  of  FIR  network  as  our  traffic  predictor. 

2.1  FIR  Network  Model 

As  mentioned  above,  the  traditional  model  of  a  multilayer  perceptron  forms  a  static  map¬ 
ping;  there  are  no  internal  dynamics.  A  modification  of  the  basic  neuron  is  accomplished 
by  replacing  each  synaptic  weight  by  a  FIR  linear  filter.  By  FIR  we  mean  that  for  an 
input  excitation  of  finite  duration,  the  output  of  the  filter  will  also  be  of  finite  duration. 
For  this  filter,  the  output  y(k)  equals  a  weighted  sum  of  past  delayed  values  of  the  input: 

T 

y{k)  =  ^2  w{n)x(k  —  n)  (3) 

n— 0 

On  the  basis  of  Eq.  3,  we  may  formulate  the  model  of  a  FIR  neuron  as  follows.  Let  Wjffl) 
denote  the  weight  connected  to  the  Zth  tap  of  the  FIR  filter  modeling  the  synapse  that 
connects  the  output  of  neuron  i  to  neuron  j(i  =  1,2,  The  index  l  ranges  from  0  to 

M,  where  M  is  the  total  number  of  delay  units  built  into  the  design  of  the  FIR  filter.  Let 
yffn)  denote  the  ouput  signal  of  neuron  j  and  xffn)  the  input  signal.  Hence,  we  have 

P  M 

vffn )  =  Wji(Qxi(n  -l)-  Oj  (4) 

t=l  /=0 

yffn)  =  <p(vi(n))  (5) 

where  vffn)  is  the  net  activation  potential  of  neuron  j,  6j  is  the  externally  applied  thresh¬ 
old  and  9?(*)  is  the  nonlinear  activation  function  of  the  neuron. 

We  may  rewrite  Eq.  4  and  Eq.  5  in  matrix  form  by  introducing  the  following  definitions 
for  the  state  vector  and  weight  vector  for  synapse  i,  respectively: 

xt(n)  =  [xffn),  xffn  -  1),  ...,xffn  -  M)]T 

w ji  =  [wjffO),Wjffl),  ...,Wji(M))T 


(6) 

(7) 
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yj(n) 


Figure  2:  Dynamic  model  of  a  neuron,  incorporating  synaptic  FIR  filters. 


Figure  3:  Signal-flow  graph  of  a  synaptic  FIR  filter. 

We  may  thus  express  the  output  yj(n)  of  neuron  j  by  the  following  equation: 

Vj(n)  =  ¥>(E  wjMn)  -  °j)  (8) 

t=i 

This  FIR  model  of  a  single  artificial  neuron  is  shown  in  Figure  2,  where  the  weight  wj0 
connected  to  the  fixed  input  x0  =  —  1  represents  the  threshold  9y  The  signal-flow  graph 
representation  of  a  FIR  filter  is  shown  in  Figure  3. 

We  may  construct  a  multilayer  perceptron  whose  hidden  and  output  neurons  are  all  based 
on  the  above  FIR  model.  Such  a  neural  network  structure  can  be  referred  to  as  a  FIR 
multilayer  perceptron.  The  difference  between  the  FIR  multilayer  perceptron  and  the 
standard  one  is  that  the  static  forms  of  the  synaptic  connections  between  the  neurons  in 
the  various  layers  of  the  network  are  replaced  by  their  dynamic  versions  (i.e.,  scalars  are 
replaced  by  vectors  and  multiplications  by  vector  products). 
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2.2  Temporal  Backpropagation  Learning 

Given  an  input  sequence  x(k),  the  network  produces  the  output  sequence  y(k)  =  N\W,  a:(A:)], 
where  W  represents  the  set  of  all  filter  coefficients  in  the  network.  Define  the  instanta¬ 
neous  error  e2(k)  =  ||<f(fc)  —  y(k)\\2  as  the  squared  Euclidean  distance  between  the  network 
output  y(k)  and  the  desired  output  d(k).  Therefore  the  objective  of  training  corresponds 
to  minimizing  over  W  the  cost  function: 


z  k=l 

where  the  sum  is  taken  over  all  K  points  in  the  training  sequence.  In  [Wan  94],  an 
algorithm  called  temporal  backpropagation  is  proposed  to  minimize  C.  The  weight-update 
equation  is  shown  by  the  following  pair  of  relations: 

wJl(t  +  1)  =  w*(t)  -  =  „,,W  +  (9) 

S  (k\  =  I  ejWv'ivjik)),  neuron  j  in  the  output  layer 

J  ~  1  <p'(Vj(k))  ZmcA  neuron  j  in  a  hidden  layer 

where  t]  is  the  learning-rate  parameter,  A  is  defined  as  the  set  of  all  neurons  whose 
are  fed  by  neuron  j  in  a  forward  manner  and  A m(k)  is  defined  as  follows: 

A  m(k)  —  [8m(k),  8m{k  +  1), ...,  8m(k  +  M)]T  (If) 

It  is  obvious  that  the  above  equations  represent  a  vector  generalization  of  the  standard 
backpropagation  algorithm.  In  fact,  if  we  replace  the  input  vector  x*(n),  the  weight 
vector  wm j ,  and  the  local  gradient  vector  Am  by  their  scalar  counterparts,  the  temporal 
backpropagation  algorithm  reduces  to  the  standard  backpropagation  for  static  networks. 
To  calculate  Sj(k)  for  a  neuron  j  located  in  a  hidden  layer,  we  filter  the  8' s  from  the  next 
layer  backwards  through  the  FIR  synapses  for  which  the  given  neuron  feeds(see  Figure 
4).  Thus  <5’s  are  formed  not  by  simply  taking  weighted  sums,  but  by  backward  filtering. 
For  each  new  input  and  desired  response  vector,  the  forward  filters  are  incremented  one 
time  step  and  the  backward  filters  one  time  step.  The  weights  are  then  adapted  on-line 
at  each  time  increment. 

Temporal  backpropagation  preserves  the  symmetry  between  the  forward  propagation  of 
states  and  the  backward  propagation  of  error  terms.  The  sense  of  parallel  distributed 
processing  is  thereby  maintained.  Furthermore,  each  unique  weight  of  synaptic  filter 
is  used  only  once  in  the  computation  of  the  <5’s;  there  is  no  redundant  use  of  terms 
experienced  in  the  instantaneous  gradient  model. 

However,  careful  inspection  of  the  above  equations  reveals  that  the  calculations  for  the 
8j(k)’ s  are  noncausal.  We  may  formulate  the  causal  form  of  the  temporal  backpropagation 
algorithm  by  a  simple  reindexing: 


(10) 

inputs 
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Neurons  m 
in  set  A 


Figure  4: 


Backpropagation  of  local  gradients  through  an  FIR  multilayer  perceptron. 


For  neuron  j  in  the  output  layer,  compute 

w  ji(k  +  1)  =  Wj,-(fc)  +  7 ]6j(k)xi(k)  (12) 

8j{k)  =  e](k)^{k)  (13) 

For  neuron  j  in  a  hidden  layer,  compute 

w  jfk  +  1)  =  w  ji(k)  +  77  6j(k  —  lM)xt(k  —  IM )  (14) 

5j(k  -  IM )  =  v'{vj{h  -  IM ))  £  A l(k  -  IM) wmj  (15) 

mtA 

where  M  is  the  total  synaptic  filter  length,  and  the  index  l  identifies  the  hidden  layer  in 
question.  Specifically,  l  =  1  corresponds  to  one  layer  back  from  the  output  layer;  /  =  2, 
two  layers  back  from  the  output  layer;  and  so  on. 


3  ATM  Traffic  Prediction  Using  FIR  Networks 

Neural  networks  have  adaptation  capability  that  can  accommodate  nonstationarity.  Their 
generalization  capability  makes  them  flexible  and  robust  when  facing  new  and  noisy  data 
patterns.  Once  the  training  is  completed,  a  neural  network  can  be  computationally  inex¬ 
pensive  even  if  it  continues  to  adapt  on-line.  Actually,  neural  networks  have  been  used 
in  call  control,  switch  control  and  routing  [Morris  94,  Hiramatsu  90].  Here  we  use  a  FIR 
neural  network  as  a  multimedia  traffic  predictor  in  ATM  networks.  The  role  of  the  neural 
network  is  to  capture  the  unknown  complex  relation  between  the  past  and  future  values 
of  the  traffic. 

3.1  Predictor  Training  Configuration 

Consider  a  scalar  time  series  denoted  by  x{n),  which  is  described  by  a  nonlinear  regressive 
model  of  order  q  as  follows  [Haykin  94]: 

x(n)  =  f{x(n  —  1),  x(n  —  2), ...,  x(n  —  q))  +  e(n) 


(16) 
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Figure  5:  Training  scheme  of  the  FIR  network. 

where  /  is  a  nonlinear  function  of  its  arguments  and  e(n)  is  a  residual.  It  is  assumed  that 
e(n)  is  drawn  from  a  white  Gaussian  process.  The  nonlinear  function  /  is  unknown,  and 
the  only  thing  that  we  have  available  to  us  is  a  set  of  observables:  a:(l), x(2), ...,  x(N), 
where  N  is  the  total  length  of  the  time  series.  We  may  use  a  FIR  multilayer  perceptron 
as  a  one-step  predictor  of  some  order  q  to  model  the  time  series,  as  shown  in  Figure  5. 
Specifically,  the  network  is  designed  to  make  a  prediction  of  the  sample  x(n),  given  the 
past  q  samples  x(n  —  1),  x(n  —  2), ...,  x(n  —  q),  as  shown  by 

x(n)  =  F(x(n  -  1  ),x(n  —  2 x(n  —  q))  +  e(n)  (17) 

The  nonlinear  function  F  is  the  approximation  of  the  unknown  function  /,  which  is 
computed  by  the  FIR  multilayer  perceptron.  The  actual  sample  value  x[n)  acts  as  the 
desired  response.  Hence  the  FIR  multilayer  perceptron  is  trained  so  as  to  minimize  the 
squared  value  of  the  prediction  error: 

e(n)  =  x(n)  —  x(n),  q  +  1  <  n  <  N.  (18) 

In  the  neural  network  literature  the  above  training  scheme  is  referred  to  as  teacher  forcing, 
while  in  the  control  and  signal  processing  literature,  it  is  referred  to  as  equation- error 
adaptation. 

In  our  application,  the  FIR  multilayer  perceptron  is  designed  as  a  1-5-1  fully  connected 
feedforward  network  with  3:3  taps  per  layer.  Selection  of  these  dimensions  is  based  mostly 
on  trial  and  error.  In  general,  selection  of  dimensions  for  neural  networks  remains  an  open 
question  in  need  of  further  research.  The  FIR  network  is  trained  with  the  causal  form 
of  temporal  backpropagation  and  the  mean-squared  error(MSE)  is  used  as  a  performance 
measure.  To  increase  the  rate  of  learning  and  yet  avoid  the  danger  of  instability,  a 
momentum  term  is  added  to  the  weight-update  equation,  i.e. , 

Aw  ji(k)  =  aAv/jfk  —  1)  +  r)6j(k)xi(k)  (19) 

where  a  is  a  positive  number  called  the  momentum  constant.  The  learning  rate  r\  and 
momentum  constant  a  are  set  at  0.1  initially.  It  has  been  found  that  the  BP  learning 
algorithm  may  learn  faster  when  the  sigmoidal  activation  function  built  into  the  neuron 
model  of  the  network  is  asymmetric  than  when  it  is  nonsymmetric.  So  we  adopt  the 
hyperbolic  tangent  activation  function  in  the  hidden  layer,  which  is  defined  by 

<p(u)  =  a  tanh(6u) 
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where  a  =  1.716  and  b  =  2/3.  In  some  of  our  experiments,  we  have  also  used  some 
heuristics  to  accelerate  the  convergence  of  backpropagation  learning  through  learning  rate 
adaptation  [Haykin  94].  Simulations  have  been  performed  to  obtain  the  neural  network 
data  set  for  both  training  and  testing(cross- validation)  purposes.  Since  we  use  the  logistic 
function  tp(v)  =  1/(1  +  exp(-u))  for  the  output  neuron,  we  have  to  normalize  the  traffic 
data  so  that  all  the  values  fall  between  0  and  1. 

3.2  Traffic  Models 

In  this  section,  we  briefly  describe  the  models  for  video  arrival  process  and  voice  arrival 
process  used  in  our  experiments. 

3.2.1  Video  Arrival  Process  Model 

Video  is  presented  to  users  as  a  series  of  frames  in  which  the  motion  of  the  scene  is 
reflected  in  small  changes  in  sequentially  displayed  frames.  Video  frames  are  generated  at 
a  constant  rate  defined  by  the  playout  rate.  As  the  amount  of  data  transmitted  per  frame 
varies  due  to  intraframe  and  interframe  coding,  video  applications  generate  traffic  in  a 
continuous  manner  at  varying  rates.  Video  is  a  relatively  new  service  in  communication 
networks  and  its  traffic  characteristics  are  not  well  understood.  It  is  also  quite  different 
from  voice  or  data  in  that  its  bit  streams  exhibit  various  types  of  correlations  between 
consecutive  frames. 

The  characteristics  of  the  video  signal  depends  primarily  on  two  factors:  1)  the  nature 
of  the  video  scene,  and  2)  the  type  of  VBR  coding  technique  employed(e.g.,  motion- 
compensated  discrete  cosine  transform,  interframe  DPCM,  etc.).  For  the  purpose  of  sim¬ 
plicity,  in  this  paper,  we  focus  on  video  services  with  uniform  activity  level  scenes,  i.e.,  the 
change  in  the  information  content  of  consecutive  frames  is  not  significant  [Onvural  94],  A 
typical  application  of  this  type  is  video  telephone  where  the  screen  shows  a  person  talk¬ 
ing.  In  general,  correlations  in  video  services  with  uniform  activity  levels  last  for  a  short 
duration  and  decay  exponentially  with  respect  to  the  time.  The  simulation  model  used  to 
generate  this  kind  of  video  coded  traffic  is  a  continuous-state  discrete-time  stochastic  pro¬ 
cess.  A  first-order  autoregressive(AR)  Markov  model  is  proposed  in  [Maglaris  88],  which 
estimates  the  bit  rate  at  the  nth  frame  from  the  bit  rate  at  the  (n  —  l)st  frame  to  be 

A  (n)  =  aA(n  —  1)  +  bw(n )  (20) 

where  A (n)  denotes  the  bit  rate  of  the  nth  frame  in  bits/pixel,  a  and  b  are  constants  and 
w(n)  is  a  Gaussian  random  variable  with  mean  m  and  variance  1.  There  are  about  250000 
pixels  per  frame  and  30  frames/s,  thus  1  bit/pixel  corresponds  to  7.5  Mbits/s.  The  mean 
E(X),  and  the  autovariance  of  the  bit  rate  C(n)  are  equal  to 


E(  A)  =  6m/(l  —  a ) 


(21) 
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Figure  6:  IPP  model. 

C(n)  =  b2an /(I  —  a2)  (22) 

which  can  be  used  to  determine  the  two  unknown  variables  a,  b  and  m: 

a  =  0.8781,  6  =  0.1108,  m  =  0.572  (23) 

The  model  is  found  to  be  quite  accurate  compared  with  the  actual  measurements  and  is 
suitable  for  simulation  studies. 


3.2.2  Voice  Arrival  Process  Model 

A  voice  source  alternates  between  talk  spurts(active)  and  silent  periods.  To  achieve  higher 
resource  utilization,  a  speech  activity  detection  may  be  used  at  the  VBR  voice  source  so 
that  voice  packets  are  generated  only  when  the  source  is  active,  thereby,  increasing  the 
transmission  efficiency.  The  correlated  generation  of  voice  packets  within  a  call  can  be 
modeled  by  an  Interrupted  Poisson  Process(IPP).  In  an  IPP  model,  each  voice  source 
is  characterized  by  ON(  corresponding  to  talk  spurt)  and  OFF(corresponding  to  silence 
duration)  periods,  which  appear  in  turn.  During  the  ON  period,  the  interarrival  times 
of  packets  are  exponentially  distributed(i.e.,  in  a  Poisson  manner),  while  no  packets  are 
generated  during  the  OFF  period.  The  transition  from  ON  to  OFF  occurs  with  the  rate 
/?,  and  the  transition  from  OFF  to  ON  occurs  with  the  rate  a(see  Figure  6).  Hence 
the  ON  and  OFF  periods  are  exponentially  distributed  with  the  mean  1//3  and  1/a. 
To  specify  this  model  completely,  we  assume  that  the  packet-generation  rate  during  the 
active  period  is  32kbps,  the  mean  talk  spurt  is  1//3  =  352ms  and  the  mean  silence  period 
is  1/a  =  650ms. 


4  Simulations 

In  this  section,  we  demonstrate  the  effectiveness  of  the  neural  network  used  as  a  traffic 
predictor.  Extensive  simulations  have  been  performed.  The  packet  arrival  process  is 
generated  from  packetized  video  sources  or/and  packetized  voice  sources  according  to  the 
models  discussed  in  the  previous  section.  We  have  used  different  data  for  the  training 
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FIR  network 

MSE  for  the  traing  set 

MSE  for  the  test  set 

1-5-1 

0.00414 

0.00423 

1-10-1 

0.00410 

0.00415 

Table  1:  MSE  of  the  experiment  for  video  traffic. 


Figure  7:  Prediction  results  for  the  bit  rate  of  the  video  traffic. 

sets  and  test  sets  by  choosing  different  initial  values  of  the  arrival  process  or  different 
seeds  of  the  random  number  generator.  For  example,  for  the  video  arrival  process,  we 
have  generated  2000  traffic  data  elements,  starting  with  A(0)  =  0.6.  The  first  400  elements 
have  been  chosen  to  be  the  training  set,  and  the  next  1600  to  be  the  test  set.  We  have  also 
tried  a  more  complicated  1-10-1  network  model  for  the  same  data  sets,  but  no  significant 
performance  improvement  is  observed.  The  values  of  MSE  of  the  above  experiment  are 
summarized  in  Table  1.  Other  prediction  results  for  the  test  sets  will  be  reported  in  the 
following. 

Experiment  T.  In  this  experiment,  we  use  three  video  sources.  The  FIR  network  is 
used  to  predict  the  bit  rate  of  the  superposition  video  arrival  process  over  the  next  frame. 
Therefore  the  lag  time  is  1/30  sec,  which  is  the  frame  generation  rate.  The  prediction 
results  are  shown  in  Figure  7  and  Figure  8,  illustrating  that  the  predicted  traffic  has 
almost  the  same  statistical  characteristics  as  those  of  the  actual  traffic. 
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Figure  8:  Comparison  of  the  autocorrelation  function  of  the  predicted  video  traffic  and 
the  actual  one. 


Figure  9:  Prediction  results  for  the  arrival  process  of  voice  sources. 
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Figure  10:  Comparison  of  the  autocorrelation  of  the  number  of  packet  arrivals  of  the 
predicted  voice  traffic  and  the  actual  one. 


Experiment  2:  In  this  experiment,  we  use  three  voice  sources.  Here  the  time  series 

x(n)  is  used  to  represent  the  count  process  N( 0,  t)  which  measures  the  number  of  packet 
arrivals  in  time  (0 ,t).  The  arrival  process  is  sampled  at  every  sampling  period  Ts.  The 
choice  of  the  parameter  Ts  is  influenced  by  the  type  of  the  traffic  and  should  guarantee 
that  the  used  sampled  version  of  the  arrival  process  captures  all  correlations  contained  in 
the  actual  process.  In  this  application  Ts  has  been  found  to  be  50  ms  [Tarraf  94].  Figure 
9  and  Figure  10  show  that  the  neural  network  prediction  is  very  close  to  the  actual  traffic 
values. 

Experiment  3:  In  this  experiment,  one  video  source  and  three  voice  sources  are  used 

to  generate  a  heterogeneous  superposition  arrival  process.  Prediction  results  of  the  count 
process  are  shown  in  Fig.  11  and  Fig.  12.  It  should  be  pointed  out  that  more  training 
iterations  of  the  neural  network  are  needed  in  this  experiment  than  in  the  previous  ones, 
since  the  heterogeneous  traffic  is  more  difficult  to  characterize. 

Experiment  f:  Recently,  Leland  et  al.  demonstrated  that  Ethernet  local  area  network 

traffic  is  statistically  self-similar  [Leland  93].  To  capture  this  fractal  behaviour,  they 
proposed  to  model  the  traffic  using  deterministic  chaotic  maps.  Chaos  is  a  dynamical 
system  phenomenon  in  which  simple,  low  order,  nonlinear  deterministic  equations  can 
produce  behaviour  that  mimics  random  processes.  To  illustrate  the  underlying  idea, 
consider  a  nonlinear  map  /(•)  that  descrides  the  evolution  of  a  state  variable  x{n )  €  (0, 1) 
over  discrete  time  as  x(n  +  1)  =  f(x(n)).  The  packet  generation  process  for  an  individual 
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Figure  11:  Prediction  results  for  the  heterogeneous  traffic. 


Figure  12:  Comparison  of  the  autocorrelation  function  of  the  predicted  traffic  and  the 
actual  one  in  experiment  3. 
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Figure  13:  Prediction  results  for  the  training  set  of  the  chaotic  time  series. 

source  can  now  be  modeled  by  stipulating  that  the  source  generates  one  or  no  packet  at 
time  n  depending  on  whether  x(n)  is  above  or  below  an  appropriately  chosen  threshold. 
If  /  is  a  chaotic  map,  the  resulting  packet  process  can  mimic  complex  packet  traffic 
phenomena.  Once  an  appropriate  chaotic  map  has  been  derived  from  a  set  of  traffic 
measurements,  generating  a  packet  stream  for  an  individual  source  is  generally  quick  and 
easy.  On  the  other  hand,  deriving  an  appropriate  nonlinear  chaotic  map  based  on  a  set  of 
actual  traffic  measurements  currently  requires  considerable  guessing  and  experimenting. 
Nevertheless,  studying  arrival  streams  to  queues  that  are  generated  by  nonlinear  chaotic 
maps  may  well  provide  new  insight  into  the  performance  of  queueing  systems  where  the 
arrival  processes  exhibit  fractal  properties. 

Here  as  another  experiment,  we  train  the  FIR  network  to  perform  one-step  prediction 
of  a  chaotic  time  series.  A  chaotic  time  series  generated  by  the  so  called  logistic  map  is 
defined  as  [Rasband  90] 

x(n  +  1)  =  4x(n)(l  —  x{n ))  (24) 

where  the  values  of  x(n)  are  all  in  the  range  (0, 1).  The  prediction  results  for  the  training 
and  test  data  sets  are  encouraging,  as  shown  in  Figure  13  and  Figure  14  respectively. 
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Figure  14:  Prediction  results  for  the  test  set  of  the  chaotic  time  series. 

5  Conclusion 

In  this  paper,  we  have  shown  that  a  FIR  network  constitutes  a  powerful  tool  for  use  in 
ATM  traffic  prediction.  The  theoretical  justification  of  this  approach  is  that  neural  net¬ 
works  are  capable  of  approximating  any  continuous  function  and  perform  non-parametric 
regression.  Furthermore,  a  FIR  neural  network  extends  the  standard  multilayer  percep- 
tron  to  a  temporal  processing  version  which  is  more  suitable  for  modeling  of  time  series. 
After  completing  the  training  phase  of  the  neural  network,  it  can  successfully  learn  the 
actual  pdf  of  the  offered  traffic(instead  of  the  approximated  simple  parameters,  such  as 
the  peak  and  mean  bit  rates).  Hence  the  neural  network  can  be  used  as  an  effective  traffic 
descriptor. 

In  ATM  networks,  traffic  management  techniques  require  traffic  parameters  that  can 
capture  the  various  traffic  characteristics  and  adapt  to  the  changing  network  environment. 
The  method  based  on  FIR  neural  networks  can  adaptively  predict  the  traffic  by  learning 
the  relationship  between  the  past  and  the  future  traffic  variations.  Therefore  it  can  be 
incorporated  into  traffic  control  functions  in  order  to  achieve  better  network  performance. 
In  [Fan  96],  we  propose  a  feedback  flow  control  mechanism  based  on  traffic  prediction  by 
FIR  neural  networks.  The  predicted  traffic  patterns  in  conjunction  with  the  current  queue 
information  of  the  buffer  can  be  used  as  a  measure  of  congestion.  When  the  congestion 
level  is  reached,  a  feedback  signal  is  sent  to  sources  to  reduce  their  bit  rates.  Simulation 
results  show  that  our  scheme  leads  to  a  much  lower  cell  loss  rate  than  the  conventional 
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feedback  control  method  and  hence  provides  a  simple  and  efficient  traffic  management  for 
ATM  networks. 


References 

[Amenyo  91]  Amenyo,  J.  T.,  Lazar,  A.  A.  and  Pacifici,  G.  (1991)  Cooperative  distributed 
scheduling  for  ATS-based  broadband  networks.  CTR  Technical  Report,  Columbia  Uni¬ 
versity,  New  York. 

[Daigle  86]  Daigle,  J.  N.  and  Langford,  J.  D.  (1986)  Models  for  analysis  of  packet  voice 
communications  systems.  IEEE  J.  Selected  Areas  in  Comm.,  SAC-4,  847-855. 

[Fan  96]  Fan,  Z.  and  Mars,  P.  (1996)  Access  flow  control  for  ATM  networks  using  a  neural 
network  traffic  predictor,  to  appear  in  Proc.  13th  IEE  Teletraffic  Symposium,  Glasgow, 
UK. 

[Funahashi  89]  Funahashi,  K.  (1989)  On  the  approximate  realization  of  continuous  map¬ 
pings  by  neural  networks.  Neural  Networks,  2,  183-192. 

[Habib  92]  Habib,  I.  W.  and  Saadawi,  T.  N.  (1992)  Multimedia  traffic  characteristics  in 
broadband  networks.  IEEE  Communications  Mag.,  48-54. 

[Haykin  94]  Haykin,  S.  (1994)  Neural  Networks.  Macmillan,  New  York. 

[Heffes  86]  Heffes,  H.  and  Lucantoni,  D.  M.  (1986)  Markov  modulated  characterization 
of  packetized  voice  and  data  traffic  and  related  statistical  multiplexer  performance. 
IEEE  J.  Selected  Areas  in  Comm.,  SAC-4,  856-868. 

[Hiramatsu  90]  Hiramatsu,  A.  (1990)  ATM  communications  network  control  by  neural 
networks.  IEEE  Trans.  Neural  Networks,  1,  122-130. 

[Hornik  89]  Hornik,  K.,  Stinchcombe,  M  and  White,  H.  (1989)  Multilayer  feedforward 
networks  are  universal  approximators.  Neural  Networks,  2,  359-366. 

[Lang  88]  Lang,  K.  J.  and  Hinton,  G.  E.  (1988)  The  development  of  the  time-delay  neu¬ 
ral  network  architecture  for  speech  recognition.  Technical  Report  CMU-CS-88-152, 
Carnegie- Mellon  University,  PA. 

[Lapedes  87]  Lapedes,  A.  and  Farber,  R.  (1987)  Nonlinear  signal  processing  using  neural 
networks:  Prediction  and  system  modeling.  Technical  Report  LA-UR-87-2662,  Los 
Alamos  National  Laboratory,  NM. 

[Leland  93]  Leland,  W.  E.,  Taqqu,  M.  S.,  Willinger,  W.  and  Wilson,  D.  V.  (1993)  On  the 
self-similar  nature  of  Ethernet  traffic.  Bellcore  Technical  Report,  NJ. 


ATM  traffic  prediction  using  FIR  neural  networks 


91 


[Maglaris  88]  Maglaris,  B.,  Anastassiou,  D.,  Sen,  P.,  Karlsson  G.  and  Robbins,  J.  D. 
(1988)  Performance  models  of  statistical  multiplexing  in  packet  video  communications. 
IEEE  Trans.  Commun.,  36,  834-843. 

[Morris  94]  Morris,  R.  J.  T.  and  Samadi,  B.  (1994)  Neural  network  control  of  communi¬ 
cations  systems.  IEEE  Trans.  Neural  Networks,  5,  639-650. 

[Onvural  94]  Onvural  R.  0.  (1994)  Asynchronous  Transfer  Mode  Networks,  Artech  House, 
Boston. 

[Rasband  90]  Rasband,  S.  N.  (1990)  Chaotic  Dynamics  of  Nonlinear  Systems,  Wiley,  New 
York. 

[Sriram  86]  Sriram,  K.  and  Whitt,  W.  (1986)  Characterizing  superposition  arrival  pro¬ 
cesses  in  packet  multiplexers  for  voice  and  data.  IEEE  J.  Selected  Areas  in  Comm., 
SAC-4,  833-846. 

[Tarraf  94]  Tarraf,  A.  A.,  Habib,  I.  W.  and  Saadawi,  T.  N.  (1994)  A  novel  neural  network 
traffic  enforcement  mechanism  for  ATM  networks.  IEEE  J.  Selected  Areas  in  Comm., 
SAC-12,  1088-1095. 

[Wan  94]  Wan,  E.  A.  (1994)  Time  series  prediction  by  using  a  connectionist  network 
with  internal  delay  lines,  in  Time  Series  Prediction(eds.  A.  S.  Weigend  and  N.  A. 
Gershenfeld),  195-217,  Addison-Wesley. 

Biography 

Zhong  Fan  received  the  BSc  and  MPhil  degrees  in  electronic  engineering  from  Tsinghua 
University,  Beijing,  in  1992  and  1994,  respectively.  He  is  now  working  toward  his  PhD  at 
University  of  Durham,  UK.  His  research  interests  include  neural  networks,  traffic  modeling 
and  control  of  ATM  networks. 

Philip  Mars  is  Professor  of  Electronics  and  Director  of  the  Center  for  Telecommunication 
Networks  at  the  University  of  Durham,  UK.  He  is  the  coauthor  of  two  research  monographs 
and  over  120  published  papers.  His  research  interests  are  in  the  application  of  nonsymbolic 
AI  to  telecommunications  and  in  network  performance  modelling  and  simulation. 


6 


Analysis,  simulation  and  experimental  veri¬ 
fication  of  the  throughput  of  GCRA  based 
UPC  functions  for  CBR  streams 

F.  W.  Hoeksema 

University  ofTwente,  Tele  Informatics  <£  Open  Systems  Group 
P.O.  Box  217,  7500  AE  Enschede.  The  Netherlands 
Tel.:  +31  53  489  2  7  70.  Email:  hoeksema@csMtMente.nl 

J.  Kroeze 

Ericsson  Telecommunication 

P.B.  8,  5120  AA  Rijen,  The  Netherlands 

Tel.:  +31  161  242  466.  Email:  etmjohls@etm.ericsson.se 

J.  Witters 
Alcatel  Bell 

Francis  Wellensplein  1.  B-2018  Antn-erp.  Belgium 
Tel.:  +32  3  240  79  27,  Email:  jM-i@rc.beI.alcateI.be 


Abstract 

This  paper  investigates  a  Generic  Cell  Rate  Algorithm  (GCR.V  based  Usage  Parameter  Centre'. 
(UPC)  function  implementing  a  discard  function  for  non-conforming  cells,  thereby  establishmc 
a  so-called  Cell  Discard  Ratio  (CDR)  as  UPC  performance  measure.  Focusing  or.  Constant  Btt 
Rate  (CBR)  connections,  the  UPC  transfer  characteristics  are  studied  m  case  of  Peak  Cell  Rate 
Contract  Violations.  Also  the  influence  of  the  Contracted  Cell  Delay  Variation  Tolerance  value 
on  the  CDR  performance  of  the  UPC  is  incorporated  in  the  study 
Algorithmic  formulas  found  by  analysis  as  well  as  simulation  results  are  com.  cared  a  cars: 
the  outcome  of  test-bed  measurements.  As  opposed  to  the  approach  in  the  analysis,  the 
simulation  effectively  takes  into  account  the  delay  experienced  b>  the  cells  accessing  the  ATM 
slotted  medium,  as  sole  origin  of  cell  delay  variation.  The  applicability  of  the  simulation  and  the 
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analysis  is  verified  with  measurements  on  real  ATM  cell  streams,  using  the  R2061  EXPLOIT 
test-bed.  Although  both  the  simulation  and  analysis  show  clear  correspondence  with  the 
measurement  results,  slight  deviations  are  found.  These  deviations  can  be  partly  explained  by 
the  slotted  nature  of  ATM  networks,  which  shows  the  importance  of  taking  this  effect  into 
account  in  the  performance  analysis  of  UPC  behaviour. 

The  study  results  in  guide-lines  for  setting  the  parameters  involved  in  policing  ATM  CBR  cell 
streams.  These  guide-lines  are  verified  by  a  test-bed  measurement  with  real  CBR  video  data 
transported  over  ATM  using  AAL1 . 


Keywords 

B-ISDN,  ATM,  UPC,  GCRA,  CBR,  throughput  analysis,  simulation,  measurements 


1.  INTRODUCTION 

The  UPC  function  is  an  ATM  layer  traffic  control  function  and  is  located  at  the  Public  UNI  (Pu- 
UNI)  of  an  ATM  network  [6].  Its  objective  is  to  monitor  and  control  traffic  per  Virtual  Channel 
Connection  or  Virtual  Path  Connection  (VCC/VPC)  in  terms  of  traffic  offered  and  in  terms  of 
validity  of  the  ATM  connection.  In  the  sequel  the  validity  of  a  VCC/VPC  is  assumed. 

Here  we  focus  on  policing  (popular  for  'UPC  Action")  of  the  Peak  Cell  Rate  (PCR  or  Rp)  of 
CBR  sources.  This  Peak  Cell  Rate  is  defined  as  the  reciprocal  of  the  minimal  interarrival  time 
between  two  consecutive  requests  to  send  an  ATM_PDU  (the  53  byte  ATM  cell)  at  the 
PHY_SAP  in  ATM  Terminal  Equipment  (TE).  The  minimal  interarrival  time  is  called  Peak 
Emission  Interval  (PEI  or  Tp),  so:  Rp  =  1/Tp.  The  arrival  times  of  the  CBR  input  traffic  are 
given  by: 

{taPHY  _sap@teM  =  tapHY  _sap@te[1]  +  (k-1)  Tp;  k  >  1 }  (U 

in  which  ta[k]  is  the  arrival  time  of  the  k-th  cell  of  the  connection. 

As  a  result  of  negotiations  between  TE  and  network  during  the  connection  setup-phase  the 
part  of  the  Traffic  Contract  necessary  for  Peak  Cell  Rate  policing  is  agreed  upon.  This  part 
consists  of  a  Contracted  Peak  Cell  Rate  Rc  (-  1/Tc)  and  a  Contracted  Cell  Delay  Variance 
(CDV)  Tolerance  xc.  The  CDV  Tolerance  allows  for  a  certain  degree  of  cell  clumping,  and  can 
be  seen  as  a  measure  of  burstiness  of  the  cell  stream  at  the  Pu-UNI.  In  order  to  be  able  to  specify 
unambiguously  in  the  Traffic  Contract  which  cells  of  a  connection  are  conforming  and  which 
cells  are  not,  the  Generic  Cell  Rate  Algorithm,  as  described  by  the  ATM  Forum  [7],  is  used  as 
a  Conformance  Definition.  A  Conformance  Definition  can  be  considered  a  deterministic  means 
of  classifying  stochastic  source  traffic  patterns. 

The  GCRA  may  not  only  be  used  to  classify  traffic  patterns,  but  also  as  an  algorithm  to 
monitor  and  control  traffic.  The  UPC  function  investigated  in  this  article  is  GCRA(Tc,xc).  Other 
algorithms  which  may  be  used  as  a  UPC  function  (e.g.  moving  window,  sliding  window)  have 
been  compared  against  GCRA  in  [5].  It  is  assumed  that  the  UPC  function  does  not  buffer  more 
than  one  cell. 

Each  cell  in  the  ATM  connection  is  labelled  conforming  or  non-conforming  by  the  UPC 
function.  Cells  which  are  labelled  non-conforming  are  assumed  to  be  discarded  in  this  work 
(this  is  not  a  necessity  however,  see  [6], [7]). 
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So,  if  X  is  the  number  of  cells  arriving  at  the  ingress  of  the  UPC  function  since  the  beginning 
of  the  connection,  and  Y  is  the  number  of  (conforming)  cells  at  the  egress  of  UPC  function  since 
the  beginning  of  the  connection,  the  number  of  discarded  cells  is  X  -  Y  and  we  may  define  a 
Cell  Discard  Ratio  (CDR)  as 

CDR  [X]  =  (XXY). 

Note  that  the  CDR  depends  on  the  number  of  transmitted  cells.  The  relation  with  the  elapsed 
time  t  since  the  beginning  of  the  connection  can  be  made  explicit  by  defining  X(t)  and  Y(t)  and 
thus 

cdrw  -siazjp. 

As  the  interest  is  in  the  long  term  behaviour  of  a  UPC  function  we  define 


CDR  =  lim  CDR  [X] 

“  x  — i  °° 

or  (2) 

CDRm  =  lim  CDR  (t) 

t  — >  00 


provided  that  these  limits  exist. 

In  the  CBR  case  we  have  X(t)  =  L  t  /  Tp  J  +  1  and  lim  — —  =  Rp  . 

t  — » °°  t 

Y  (t) 

Defining  the  Passed  Cell  Rate  Ro  =  lim  — —  (provided  this  limit  exists),  we  find 

t  «  t 


CDR  =  ^  Po  *  1  _  Pp  Ho 

00  Rp  ■  t  Rp  K  J 

If  a  stream  is  not  conforming  to  GCRA(Tc,xc),  e.g.  because  Rp  >  Rc  (or  equivalently  because 
Tp  <  Tc)  cells  are  discarded:  the  Traffic  Contract  is  violated.  The  discarding  of  excess  traffic  is 
in  accordance  with  the  Traffic  Contract  and  does  not  contribute  to  the  network  Performance 
degradation  allocated  to  the  UPC  function  [section  3.2. 3. 2,  6]. 

In  this  paper  the  throughput  behaviour  of  a  UPC  function  is  considered  "ideal"  if  the  amount 
of  discarded  cells  is  proportional  to  the  amount  of  contract  violation  (a  property  called 
Throughput  Fairness  (TF)  [3], [4]),  and  if  no  cells  are  lost  if  Tp  >  Tc.  So,  the  "ideal"  throughput 
behaviour  is  given  by 

CDR„  =  1  -  for  8  <  Tp  <  Tc  and  CDR^  =  0  for  Tp  >  Tc.  (4) 

Restated  otherwise:  the  "ideal"  throughput  behaviour  of  the  UPC  function  is  defined  as  the 
property  to  always  admit  the  Contracted  PCR,  irrespective  of  the  magnitude  of  the  Contract 
Violation. 
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In  figure  1  two  views  of  the  "ideal"  throughput  behaviour  of  a  UPC  algorithm  are  depicted. 


Figure  1 .  "Ideal"  throughput  behaviour  of  a  UPC  function. 

The  left  side  is  a  graphical  representation  of  formula  (4),  while  the  right  side  of  the  figure 
shows  the  normalised  Passed  Cell  Rate  as  a  function  of  the  Peak  Cell  Rate  (both  normalized 
with  respect  to  the  Contracted  Cell  Rate).  The  figure  can  be  easily  derived  rewriting  formulas 
(3)  and  (4)  as 

—  =  (1  -  CDR  )  ■  (5) 

Rc  v  Rc  w 

8  denotes  the  ATM  Cell  Slot  Time,  the  inverse  of  the  Cell  Transfer  Capacity  C  [cell/s]  of  the 
link  at  the  PHY_SAP  (e.g.  C  =  155.52  106  /  (53 . 8)  cell/s  for  STM-1).  It  is  assumed  that 
Rp  <  C,  or  equivalently  Tp  >  8. 

Note  that  our  UPC  function  acts  instantaneously  on  a  per-cell  basis,  there  is  no  shaping 
included  (which  might  be  beneficial  from  the  user’s  point  of  view,  provided  that  introduction 
of  delay  is  acceptable  for  the  service). 

In  this  article  we  compare  the  throughput  behaviour  of  the  GCRA  based  UPC  function  with 
respect  to  Throughput  Fairness,  using  results  from  analysis,  simulation  and  test-bed  measure¬ 
ments.  Guidelines  for  the  selection  of  the  Contracted  CDV  Tolerance  will  be  presented. 

In  the  next  section  the  terminal  configuration,  the  source  traffic  and  causes  of  CDV  are 
presented.  In  section  3  the  results  of  analysis  and  guidelines  for  the  selection  of  the  Contracted 
CDV  Tolerance  are  given.  The  simulation  results  are  presented  in  section  4  and  test-bed 
measurements  in  section  5.  In  section  6  the  guidelines  are  verified  by  policing  a  CBR  video 
stream. 

In  section  7  results  of  analysis,  simulation  and  test-bed  measurement  are  compared. 

Section  8  contains  our  conclusions. 
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2.  TERMINAL  CONFIGURATION,  SOURCE  TRAFFIC  AND  CDV 

A  CBR  source  may  be  connected  to  ATM  Terminal  Equipment  (TE)  via  AAL1  at  a  VCC/VPC 
endpoint.  This  source  produces  a  stream  of  requests  to  send  an  ATM_SDU  (ATM  cell  payload, 
48  byte;  interaction  primitive  ATM_SDU_Data.request).  It  is  assumed  that  the  previously 
mentioned  stream  (the  user  data  component  [section  2.3.3,  6]  of  the  tagged  connection)  is 
multiplexed  at  the  ATM  layer  in  the  TE  with  Operations,  Administration  and  Maintenance 
(OAM)  cells  and  cell  streams  of  other  connections.  After  multiplexing,  the  cells  of  the  tagged 
connection  are  considered  to  be  shaped  by  a  shaper,  resulting  in  a  stream  with  PEI  Tp. 


VCC/VPC  endpoint 


Public 

UNI 


Tagged 

VCC/VPC 


ATM_SDU_Data.  requests 
'  at  endpoint  of  tagged  connection 
ATM_PDU_Data.  requests 
•  at  PHY  SAP  of  TE 


GCRA(  Tp  ,  0  ) 


Tagged  stream  conforms  to 

GCRA(  Tp  ,  x* * )  GCRA(fp.T) 


Figure  2.  Terminal  Example. 


In  figure  2  a  terminal  example  is  given,  using  an  adaptation  of  both  figure  4  of  1.371  [6]  and 
the  PCR  Reference  model  of  the  ATMF  UNI  3.0  specification  [7]. 

At  the  PHY_SAP  of  TE  the  cell  stream  (stream  of  ATM_PDU_Data.requests)  of  the  tagged 
connection  has  an  PEI  of  Tp,  a  result  of  the  shaper.  However,  this  is  not  the  case  any  more  at 
the  Public  UNI  (stream  of  ATM_PDU_Data.indications  at  the  PHY_SAP  of  Pu-UNI).  Cell 
Delay  Variation  is  introduced  by  the  following  mechanisms: 

•  1/.  Due  to  the  ATM  multiplexing  with  OAM  cells  and  cells  of  other  connections  at  the 
PHY_SAP  in  TE  some  cells  may  be  delayed. 

•  21.  The  ATM_PDU_Data.indications  at  the  PHY_SAP  of  the  Pr-UNI  only  occur  at  discrete 
time  instances.  This  so  called  slotted  nature  of  ATM  networks  causes  CDV. 

•  3/.  Due  to  insertion  of  PHY  layer  overhead  the  ATM  cell  slot  times  may  not  be  of  equal 
length. 

•  4/.  Between  the  Pr-UNI  and  Pu-UNI  CDV  may  be  introduced  by  Customer  Premises  Equip¬ 
ment  (CPE). 


Throughput  of  GCRA  based  UPC  functions  for  CBR  streams 


97 


Note  that  the  precise  nature  of  CDV  causes  in  the  CPE  is  left  unspecified  here.  It  may  consist 
of,  but  not  be  limited  to:  ATM  multiplexing,  the  slotted  nature  of  ATM  networks  or  PHY  layer 
overhead. 
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of  tagged  connection 
at  PHY_SAP  of  TE 

ATM_PDU_Data.request 
of  OAM  and  other  connections 
at  PHY_SAP  of  TE 

Sum  of  all 
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of  all  connections 
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Figure  3.  CDV  of  cells  of  the  tagged  connection  (no  CPE  caused  CDV). 


In  Figure  3  an  example  of  the  introduction  of  CDV  is  given  (CDV  cause  1/.  to  3/.,  no  CDV 
caused  by  4/.).  Note  that  the  shaper  guarantees  a  PEI  of  Tp  at  the  PHY_S  AP  of  TE  for  the  tagged 
connection.  Competing  ATM_PDU_Data.requests  at  the  PHY_SAP  in  TE  are  assumed  to  be  in 
continuous  time  (see  the  summation  in  the  figure  above). 

In  the  following  analysis  (section  3)  all  these  four  causes  of  CDV  are  neglected.  In  the 
simulation  (section  4)  however,  the  slottedness  (cause  2/.)  is  taken  into  account.  During  the 
measurements  with  an  ATM  traffic  generator  CDV  causes  3/.  and  4/.  may  be  present 
(section  5).  Finally,  the  traffic  from  a  TV  Terminal  Adapter  may  experience  all  CDV  causes 
mentioned  above  (see  section  6). 
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3.  ANALYSIS 


In  figure  4  the  Cell  Streams  of  figure  2  are  given,  only  showing  CDV  caused  by  the  slottedness 
of  ATM  networks  (CDV  cause  21.) 


Cells  of  the  tagged  connection  only: 
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Figure  4.  CDV  caused  by  the  slotted  nature  of  ATM  only. 


The  slotted  nature  of  the  ATM  networks  is  neglected  in  the  presented  expressions  but  is 
important  in  comparing  our  results  with  results  of  simulation  experiments  [3], [4]  and  test-bed 
measurements  [12],  The  assumption  we  make  is  that  (apart  from  a  constant  delay): 

taPHY_SAP@Pu-UNlM  =  taPHY_SAP@TEM  (6) 


while,  when  2/.  is  taken  into  account  (see  figure  4): 

taPHY_SAP@Pu-UNlM  “  r  tapHY_SAP@TEM  /  8  1  8  (7) 

Only  the  situation  of  Traffic  Violation  is  investigated,  thus  8  <  Tp  <  Tc,  no  cells  are  lost  if 
Tp  >  Tc.  In  the  sequel  we  will  present  CDR^  as  a  function  of  Tc,  Tp  and  xc ;  a  derivation  of  these 
results  is  given  in  [13]. 

Three  cases  can  be  distinguished,  depending  on  the  Contracted  CDV  xc: 

•  tc  =  0 
In  this  case: 

CDR^  =  1  -  1/N,  with  N  =  [  Tc  /  Tp  1  (8) 


•  0<xc<Tp 

In  this  case  an  algorithmic  solution  was  found  in  [13].  It  consists  of  finding  P  and  M  (both 
required  to  be  integer  and  as  small  as  possible)  which  satisfy: 

M  Tc  /  Tp  +  1  <  P  <  M  Tc  /  Tp  +  1  +  (1  -  xc  /  Tp)  (9) 


or  equivalently: 
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P  <  (M  Tc  -  tc )  /  Tp  +  2  <  P  +  (1  -  xc  /  Tp)  (10) 

As  P  >  M  the  algorithm  should  start  with  M  =  1 . 

Then: 

CDRM  =  1  -  M  /  (P-1)  (11) 


*  Tc>Tp 

It  can  be  shown  that  the  throughput  behaviour  is  "ideal"  in  this  case,  so: 

CDR  -  1-lE  (12) 

oo  Tc  v  ' 

Now,  we  will  present  some  results  of  the  analysis  which  can  be  directly  compared  to  the 
results  of  simulation  (section  4)  and  test-bed  measurements  (section  5).  Comparisons  will  be 
deferred  until  the  simulation  and  test-bed  results  are  presented. 


In  figure  5  the  CDR  as  a  function  of  the  PEI  Tp  is  presented  for  Tc  =  10  5  for  different  values 
of  the  Contracted  CDV  Tolerance  xc.  The  "ideal"  throughput  curve  is  shown  too  (see 
figure  1). 


Figure  5.  CDR,*,  as  a  function  of  Tp  [5]  for  Tc  =  10  5.  Different  values  of  xc. 


100 


Part  Two  Traffic  and  Congestion  Control 


Figure  6  shows  the  Passed  Cell  Rate  Ro  as  a  function  of  the  Peak  Cell  Rate  Rp  for  both  the 
"ideal"  behaviour  and  the  behaviour  for  xc  =  8  8  with  Tc  =  63.75  8. 


Figure  6.  Passed  Cell  Rate  Ro  as  a  function  of  PCR. 

Figure  7  offers  a  closer  look  at  the  previous  figure,  and  shows  the  influence  of  a  change  in 
Contracted  CDV  Tolerance. 


Figure  7.  Passed  Cell  Rate  Ro  as  a  function  of  PCR. 


From  our  analysis  it  is  clear  that  the  throughput  behaviour  for  xc  =  0  is  far  from  "ideal". 
Increasing  tc  in  the  region  0  <  xc  <  Tp  improves  the  throughput  behaviour  of  the  GCRA  based 
UPC  function,  but  does  not  realize  "ideal"  throughput  behaviour  either.  Only  when  xc  >  Tp  the 
situation  is  reached  in  which  the  Throughput  Fairness  property  holds. 

These  observations  allow  us  to  provide  guidelines  for  the  selection  of  the  Contracted  CDV 
Tolerance  if  Throughput  Fairness  is  required: 

In  the  connection  set-up  phase,  the  user  presents  his  requested  PEI  Tu  =  Tp  and  requested 
CDV  Tolerance  xu  =  0  to  the  network.  For  the  UPC  function  to  perform  "ideal"  throughput 
behaviour  our  analysis  shows  that  the  contracted  values  should  be  Tc  =  Tp  and  xc  =  Tp  =  Tc.  If 
a  user  intents  to  violate  the  Traffic  Contract  and  specifies  the  required  PEI  as  Tu  =  Tp’  >  Tp  (the 
actual  PEI)  and  required  CDV  Tolerance  as  xu  =  0,  the  network  will  select  Tc  «  Tp’  and 
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xc  =  Tp’  =  Tc  if  "ideal"  UPC  throughput  behaviour  is  required.  From  our  analysis  we  notice  that 
the  Contracted  CDV  Tolerance  is  too  high  in  this  case,  it  could  be  xc  =  Tp  <  Tp’.  As  the  network 
has  no  idea  of  the  intentions  of  the  user  the  only  "fair"  selection  of  xc  =  Tc  =  Tp’. 

In  [3]  it  is  shown  that  the  following  approximation  of  the  CDR  holds  when  A  is  fairly  small 
and  Tc  >  xc: 

CDR^  =  Tc  *  A  /  xc  ,  where  A  =  (Tc-Tp)  /  Tc  (13) 

a  result  which  is  in  accordance  with  [1],  [4]  and  [2],  where  a  more  general  relation  is  given, 
CDR00=  1  -d/(d+  1)  ,  with  d  =  T  xc  /  (Tc  -  Tp)  1  (14) 

which  is  in  agreement  with  the  algorithm  in  (8),  (9)  and  (10).  However,  note  that  (14)  only  holds 
for  0  <  xc  <  Tp  and  Tc/2  <  Tp  <Tc  as  is  stated  as  a  (strong)  conjecture  in  [13].  Using  (14)  it  can 
be  shown  that  the  approximation  in  (13)  is  in  fact  an  upper  bound  to  CDR^,  which  becomes 
tighter  as  A  «  1 .  So,  for  small  contract  violations  (13)  shows  that  for  xc  >  Tc  indeed  CDR^,  =  A. 


4.  SIMULATION 

The  GCRA  based  UPC  function  is  analysed  by  simulation  techniques  taking  into  account  the 
delay  experienced  by  cells  accessing  the  ATM  slotted  medium  (CDV  cause  2/.,  section  2).  ATM 
cells  are  generated  periodically  by  a  CBR  source  (on  the  real  time  axis).  Due  to  the  transfer  to 
the.  ATM  slotted  medium,  cells  can  be  delayed  while  others  are  not  influenced,  thus  introducing 
a  limited  CDV  (see  figure  4).  This  cell  flow  is  then  passed  to  the  UPC  function  where  the  GCRA 
is  executed  and  the  CDR  is  measured. 


Figure  8.  CDR  as  a  function  of  Tp.  Tc  =  10  8. 

The  CDR  in  case  of  a  Contracted  PEI  Tc  =  10  8  is  shown  in  figure  8  as  a  function  of  the  actual 
PEI  and  of  the  Contracted  CDV  tolerance  xc.  Comparing  the  results  with  the  analytical  ones  (see 
figure  5),  there  are  a  lot  of  similarities.  The  major  differences  are  the  oscillations  marking  the 
transitions  between  two  regions  of  constant  CDR  which  occur  at  PEIs  which  are  not  an  integer 
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multiple  of  the  cell  slot  time  8.  In  that  case  some  cells  experience  a  delay  due  to  the  access  to 
the  ATM  slotted  medium  while  others  are  not  delayed,  so  the  arrivals  at  the  UPC  function  are 
not  strictly  periodical  any  more  (e.g.  figure  4,  lower  part) 

Indeed,  a  CBR  connection  which  exceeds  the  Contracted  PCR  Rc  by  only  a  small  amount 
immediately  experiences  a  severe  Cell  Discard  Ratio,  disproportionate  with  the  degree  of 
misbehaviour  A  (see  (13)).  On  the  other  hand  a  source  which  sends  twice  the  amount  of  allowed 
traffic  sees  a  CDR  of  50%,  a  value  in  accordance  with  the  TF  property. 

From  the  simulations  it  became  clear  that  a  GCRA  based  UPC  function  tuned  with  the 
Contracted  PEI  Tc  and  xc  =  [  Tc  /  8 1 5  >  Tc  obtains  Throughput  Fairness.  Similar  findings  have 
been  reported  in  [8]  when  observing  the  UPC  responsiveness  characteristics. 


5.  MEASUREMENTS 

The  initial  aim  of  the  experiments  was  to  validate  the  correct  operation  of  the  implemented  UPC 
function,  see  [12].  Here  the  measurement  results  are  used  to  verify  the  applicability  of  the 
analysis  and  simulations  in  the  previous  sections,  using  a  real  ATM  network. 

The  experiments  were  performed  at  the  EXPLOIT  test-bed,  which  has  four  ATM  switches, 
several  terminals  with  appropriate  adapters  and  measurement  equipment  (see  [10]).  The  Police 
Function  Board  (PFB)  can  implement  several  UPC  mechanisms,  like  Leaky  Bucket  (LB), 
jumping  window,  moving  window  etc.  Here  we  use  the  LB  mechanism  as  UPC  function.  The 
PFB  is  part  of  one  module  in  the  Remote  Unit  (RU)  which  is  one  of  the  available  switches  at 
the  test-bed. 

Figure  9  shows  a  simplified  view  of  the  experiment  configuration.  As  indicated  by  the  arrows, 


Figure  9.  EXPLOIT  Experimental  configuration. 

RU:  Remote  Unit  (a  switch);  PFB:  Police  Function  Board. 


the  traffic  from  four  input  ports  is  multiplexed  to  one  stream  entering  the  PFB.  Cells  may 
therefore  always  experience  delay  variations  within  the  RU  before  being  policed.  Even  with 
only  one  source  connected,  CDV  is  caused  by  Operations  Administration  and  Maintenance 
(OAM)  traffic  inside  the  RU  (interpreting  the  part  of  the  RU  before  the  PFB  as  CPE  allows  us 
to  model  this  as  CDV  cause  4/.).  Traffic  leaving  the  PFB  is  routed  back  through  the  RU  and 
demultiplexed  to  one  of  the  four  output  ports  of  this  module. 
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The  ATM  traffic  streams  were  generated,  received  and  analysed  with  the  Alcatel  8643,  a 
PC-card  with  memory  for  8192  assigned  ATM  cells.  This  memory  can  be  played  out  repeti¬ 
tively,  while  counters  keep  track  of  the  number  of  sent  and  received  cells. 

The  implemented  LB  is  a  discrete  state  realisation  of  the  GCRA  and  is  defined  by  three 
discrete  parameters:  splash ,  leak  rate  and  bucket  limit.  The  bucket  level  is  also  discrete  and 
ranges  from  0  to  65535  units.  With  every  passed  cell,  a  splash  (from  0  to  255  units)  is  added  to 
the  bucket  level,  which  leaks  with  a  constant  rate,  selectable  by  factors  of  two  from  214  to 
27  units/slot.  A  cell  arriving  at  the  PFB  is  discarded  if  the  bucket  level  exceeds  the  bucket  limit, 
which  can  be  set  from  0  to  65504  in  steps  of  32.  These  calculations  are  all  performed  by  the 
Police  Criterion  Chip  (PCC)  [11], 

The  PFB  parameters  are  related  to  the  GCRA  parameters  by: 

Tc  =  splash  /  leak  rate  xc  =  bucket  limit  /  leak  rate  ( 1 5) 

With  appropriate  parameters,  the  PCC  can  act  as  a  cell  counter  such  that  the  number  of 
received  and  discarded  cells  at  the  PFB  are  known.  This  allows  to  verify  that  all  generated  cells 
make  it  to  the  PFB  and  that  cells  are  only  lost  due  to  discards  and  not  by  e.g.  buffer  overflow. 
Instead  of  the  PFB  parameters  we  will  in  the  following  use  the  parameters  Tc  and  xc,  both 
expressed  in  slots  at  155.52  Mbit/s. 

Some  initial  measurements  where  the  PCR  Rp  and  Contracted  PCR  Rc  were  exactly  the  same 
(Tp  =  Tc  =  64  8)  revealed  a  Cell  Discard  Ratio  of  approximately  2 . 1 0-6,  although  xc  was  chosen 
large  enough  to  allow  all  possible  CDV.  These  discards  are  due  to  frequency  deviations  between 
the  crystal  of  the  free-running  clock  of  the  A8643  and  the  clock  crystal  of  the  RU,  to  which  the 
PFB  is  synchronised.  Note  that,  although  this  causes  cell  discards,  it  is  not  one  of  the  CDV 
causes  mentioned  in  section  2.  To  prevent  these  discards,  a  slightly  lower  value  of 
Tc  =  63.75  8  is  used  for  further  measurements. 


Figure  10.  Passed  Cell  Rate  Ro  as  a  function  of  PCR. 
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Figure  10  depicts  the  measured  throughput  of  the  LB  as  a  function  of  the  PEI  Tp  of  the  A8643 
for  a  Contracted  PEI  Tc  of  63.75  slots  (at  155.52  Mbit/s).  The  Contracted  CDV  Tolerance  is 
xc  =  8  8.  Mind  that  the  dots  in  figures  10  and  1 1  are  the  only  measurement  points,  the  line  pieces 
just  connect  the  dots  in  the  correct  order.  The  generated  cell  streams  of  the  A8643  only  have  Tp 
values  which  are  integer  multiples  of  the  cell  slot  time,  so  that  the  cell  inter-departure  times  at 
the  egress  of  A8643  are  strictly  constant,  e.g.  the  pattern  as  shown  in  the  upper  part  of  figure  4. 

However,  on  the  way  to  the  RU  and  the  PFB,  the  cell  stream  probably  suffers  CDV  by  multi¬ 
plexing  with  OAM  cells  in  the  RU  (CDV  cause  4/.  in  section  2).  ATM  multiplexing  in  TE 
(cause  1/.)  does  not  play  a  role.  Due  to  the  integer  values  of  Tp  the  slottedness  of  ATM  networks 
(cause  21.)  is  not  taken  into  account.  Physical  layer  overhead  (cause  3/.)  may  play  a  role. 

The  throughput  is  defined  as  the  ratio  of  the  actual  cell  rate  of  the  A8643  and  the  Contracted 
PCR,  multiplied  by  the  ratio  of  the  numbers  of  passed  and  sent  cells  (so  Ro/Rc,  see  (5)).  Both 
axes  are  normalised  to  the  Contracted  PCR.  This  figure  can  be  directly  compared  to  figure  6. 

It  is  clear  from  figure  10  that  the  throughput  depends  both  on  the  offered  PCR  and  on  the  CDV 
tolerance.  As  long  as  the  actual  PCR  does  not  exceed  the  Contracted  PCR,  all  cells  pass  the  LB. 
If  however  the  actual  PCR  increases  above  the  contracted  value,  the  passed  traffic  is  limited  to 
the  Contracted  PCR,  even  if  the  A8643  generates  at  full  link  rate.  Two  measurements  with  a 
PCR  slightly  below  and  above  the  contracted  one  can  not  be  distinguished  in  the  plot,  but  the 
results  are  as  expected.  We  see  again  that  the  punishments  imposed  by  the  LB  are  not  propor¬ 
tional  to  the  violation.  This  is  shown  more  clearly  in  figure  1 1 ,  which  covers  the  region  of  small 
contract  violations  (compare  with  figure  7).  The  dips  in  the  throughput  curve  vanish  when  the 
CDV  tolerance  is  increased  to  the  Contracted  PEL 


Figure  1 1 .  Passed  Cell  Rate  Ro  as  a  function  of  PCR. 
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6.  PEAK  CELL  RATE  CONTROL  FOR  DIGITAL  VIDEO 

The  experience  gained  in  policing  CBR  sources  was  used  to  police  one  of  the  real  sources 
available  at  the  EXPLOIT  testbed,  namely  the  TV  signal  (audio  and  video  signal)  which  origi¬ 
nates  from  the  TV -Terminal  Adapter  (TV-TA).  The  (composite)  TV  signal  is  converted  to  a 
digital  signal  with  a  bit  rate  of  34.368  Mbit/s  for  the  video  signal  and  0.96  Mbit/s  for  the  audio 
signal.  These  "reference"  signals  will  eventually  be  packetized  into  ATM  cells  according  to  the 
AAL1  adaptation  layer  standard  for  CBR  traffic.  Before  that  however,  Forward  Error 
Correction  (FEC)  combined  with  bit  or  byte  interleaving  is  used  for  error  correction  [9]. 

For  the  video  component  a  Reed-Solomon  code  is  used  which  allows  for  an  error  correction 
of  at  most  4  cells  out  of  64  consecutive  cells.  So  the  source  characteristics  undergo  changes 
because  of  the  introduction  of  FEC  overhead  and  (in  AAL1)  signal  timing  recovery.  This  results 
into  a  ATM  physical  rate  of  41.366  Mbit/s  for  the  video  signal  and  of  1.49  Mbit/s  for  the  audio 
component. 


Figure  12.  Measured  CDR  as  a  function  of  Contracted  CDV  Tolerance,  q  =  Rp  /  Rc 

In  the  policing  experiments,  the  TV-set  is  continuously  sending  traffic  which  is  passed 
through  the  RU  (where  the  PFB  is  located)  to  a  TV  screen.  The  Contracted  PCR  (the  PCR  which 
is  used  for  policing  the  source)  and  the  Contracted  CDV  tolerance  are  changed  for  each  exper¬ 
iment  and  the  Cell  Discard  at  the  PFB  is  measured.  In  figure  12  the  CDR  is  plotted  as  a  function 
of  the  Contracted  CDV  tolerance  xc  and  for  different  PCRs  of  the  video  source  which  are 
expressed  in  terms  of  the  Contracted  PCR  Rc  (q  =  Rp/Rc,  Rp  =  41.366  Mb/s).  The  Contracted 
PEIs  Tc  can  be  derived  from  the  q-values  by: 

Tc  =  q  Tp  =  q  C  /  Rp  8.  (16) 

This  results  in  Tc  =  16.0  5,  15.2  8,  15.0  5,  14.9  8  and  13.0  8  for  the  q-values  in  figure  12. 

If  the  actual  PCR  of  the  source  is  higher  than  the  Contracted  PCR  (the  theoretically  Contract 
Violation  region,  defined  by  q  >  1),  then  changing  the  CDV  tolerance  (within  reasonable  limits) 
has  only  a  very  small  influence  on  the  CDR  (this  can  be  seen  in  the  upper  right  part  of  figure 
12).  Once  the  CDV  tolerance  is  larger  than  the  inverse  of  the  Contracted  PCR  (=  Tc),  the  UPC 
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discards  exactly  the  excess  amount  of  traffic  (this  result  was  found  also  earlier,  see  figure  1 1,  in 
which  the  passed  cell  rate  Ro  equals  the  Contracted  PCR  Rc,  provided  xc  is  large  enough). 

This  observation  leads  to  a  method  which  allows  us  to  derive  experimentally  the  actual  PCR 
of  a  policed  CBR  source,  once  the  CDR  is  measured.  In  case  of  a  Contract  Violation  and  a  CDV 
Tolerance  value  larger  than  the  Contracted  PCR,  we  observed  that  Ro  =Rc,  so  using  (5)  we  find: 

Rp  =  Rc - I -  (17) 

F  1  - CDR 

Indeed,  if  this  method  is  applied  to  the  experimental  results,  we  find  a  PCR  of 
41.3666  Mbit/s  on  the  155.52  Mbit/s  link  and  so  a  "reference"  signal  rate  of  34.368  Mbit/s 
exactly  as  mentioned  earlier.  The  same  procedure  was  successfully  used  to  determine  the  PCR 
of  the  audio  component  of  the  TV  signal. 

If  the  actual  PCR  of  the  source  is  below  the  Contracted  PCR  (the  theoretically  no  Contract 
Violation  region,  defined  by  q  <  1 ,  the  lower  left  part  of  figure  12),  all  cells  should  be  accepted 
if  the  assumptions  used  for  the  analysis  hold.  From  these  measurement  results  we  conclude  that 
at  least  one  of  the  mentioned  CDV  causes  (section  2)  is  present.  The  clock  mismatch  problem 
(section  5)  is  believed  not  to  influence  these  results,  since  the  clocks  of  the  TV-TA  and  PFB 
were  synchronised. 

Since  apparently,  in  realistic  situations,  cells  always  experience  some  CDV,  increasing  the 
CDV  tolerance  seems  necessary.  It  drastically  reduces  the  CDR,  even  if  no  discards  are 
expected  (being  in  the  theoretically  no  Contract  Violation  region,  q  <1).  This  is  especially  true 
if  the  PCR  approaches  the  Contracted  PCR  (q  =  1 ).  In  order  to  draw  further  conclusions  we  need 
to  model  the  RU  more  precisely. 


7.  COMPARISON  OF  ANALYSIS,  SIMULATION  AND  MEASUREMENTS 

When  comparing  the  results  of  analysis  and  simulations,  the  impact  of  the  slottedness  of  ATM 
networks  on  the  throughput  behaviour  of  a  GCRA  based  UPC  function  becomes  clear.  Mind 
that  the  CDR  values  at  integer  values  of  Tp  (see  figure  8)  should  equal  the  values  of  CDR^,  at 
integer  values  of  Tp  in  figure  5,  something  which  graphically  appears  to  be  the  case  (and  shows 
that  the  number  of  simulated  arrivals  was  large  enough). 

Comparing  the  results  of  the  test-bed  measurements  with  those  of  the  analysis  shows  a  good 
agreement  between  the  two  (at  least  graphically).  From  this  we  conclude  that  the  effects  on 
discards  caused  by  the  clock-mismatch  between  A8643  and  PFB  are  effectively  undone  by  a 
slight  adaptation  of  the  Contracted  PEI  Tc.  Note  that  the  impact  of  the  slottedness  of  ATM  was 
not  encountered  in  these  experiments,  as  the  cell  arrival  process  had  only  PEIs  at  a  multiple  of 
the  cell  slot  time.  Apparently,  the  OAM  traffic  generated  by  the  RU  (CDV  cause  4/.,  section  2) 
does  not  alter  the  measured  throughput  function  significantly. 

The  CBR  video  experiment  (section  6)  showed  that  increasing  the  Contracted  CDV 
Tolerance,  even  in  the  theoretically  no  Contract  Violation  case  is  beneficial  from  a  UPC  discard 
point  of  view.  This  v/as  not  found  using  the  other  approaches,  which  shows  the  importance  of 
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the  measurements.  However,  the  necessity  of  detailed  modelling  of  the  system  on  which  the 
measurements  were  performed  is  clear. 


8.  CONCLUSIONS 

In  this  paper  the  throughput  behaviour  of  a  GCRA  based  UPC  function  for  ATM  traffic  control 
has  been  studied  by  analysis,  simulation  and  measurements. 

The  analysis  resulted  in  an  algorithm  to  compute  the  Cell  Discard  Ratio  CDR„  for  infinite 
length  CBR  traffic  streams.  The  CDR  is  a  function  of  the  contracted  GCRA  parameters  Tc,  xc 
and  the  PEI  Tp  of  the  cell  stream.  The  analysis  does  not  take  any  CDV  generating  mechanism 
into  account,  and  assumes  that  cells  arrive  in  continuous  time,  spaced  Tp  apart,  at  the  ingress  of 
the  UPC  function. 

The  results  of  the  simulation  are  in-line  with  the  analysis  results  as  far  as  comparable.  The 
simulation  takes  the  CDV  due  to  the  slottedness  of  the  physical  layer  into  account.  The 
influence  of  the  slottedness  on  the  throughput  behaviour  of  the  UPC  function  is  limited  but 
significant.  No  results  of  analysis  have  been  found  by  the  authors  which  take  the  slottedness  of 
ATM  into  account,  although  the  methods  presented  in  [13]  may  prove  valuable  in  solving  this 
issue. 

Measurements  with  an  ATM  traffic  generator  and  an  implemented  UPC  function  show  that 
the  analysis  and  simulation  apply  to  real  ATM  networks.  Analysis  and  simulation  show 
remarkable  resemblance  with  the  measurement  results.  Effects  of  clock-mismatch  can  be 
undone  effectively  by  a  slight  adaptation  of  Tc,  at  least  in  the  case  where  no  CDV  due  to 
slottedness  is  present. 

Measurements  with  real  ATM  audio  and  video  traffic  however,  show  that  apart  from  the 
slottedness  of  ATM  networks,  other  CDV  causes  have  their  influence  on  the  throughput  of  the 
UPC  function  as  well.  This  is  especially  apparent  if  the  PCR  approaches  the  Contracted  PCR. 
In  this  case  increasing  xc  drastically  reduces  the  cell  discard.  In  order  to  explain  these  observa¬ 
tions,  the  experimental  configuration  has  to  be  modelled  in  greater  detail. 

Guidelines  for  the  selection  of  Tc  and  xc  were  presented,  based  on  the  results  from  analysis 
and  simulation.  These  guidelines  proved  useful  when  applied  to  a  realistic  situation,  although 
also  in  the  theoretically  no  Contract  Violation  case,  xc  has  to  be  increased.  As  to  how  much 
exactly,  guidelines  still  need  to  be  found.  This  is  an  item  for  further  study. 

The  research  may  have  a  follow-up  by  extending  analysis  and  simulation  by  inclusion  of 
more  CDV  causes.  The  two  approaches  should  be  tested  with  measurements  in  systems  that  are 
described  at  a  level  of  detail  required  by  the  models  used. 
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A  study  consisting  of  analysis,  simulation  and  measurements  of  the  policing  mechanism  for 
traffic  using  other  ATM  Transfer  Capabilities  (a.k.a.  ATM  service  categories  or  connection 
types)  seems  necessary,  e.g.  for  Variable  Bit  Rate  traffic,  for  Available  Bit  Rate  traffic  or  for 
Signalling  traffic. 

Policing  of  the  Sustainable  Cell  Rate  is  another  candidate  subject  for  further  study. 
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Abstract 

Multi-media  and  data  traffic  are  anticipated  to  occupy  much  of  the  resources  in  integrated  serv¬ 
ices  networks,  based  on  ATM.  These  traffic  types  appear  to  exhibit  strong  autocorrelation  over 
long  periods,  which  affects  the  performance  of  statistical  multiplexing  detrimentally.  The  cor¬ 
relation  has  most  commonly  been  handled  by  spreading  the  traffic  in  time,  so  called  shaping, 
which  may  introduce  considerable  delay. 

We  take  a  different  approach,  namely  spreading  the  traffic  in  space  over  multiple,  independ¬ 
ent  paths.  The  autocorrelation  in  the  traffic  is  thereby  reduced  and  bursts  are  spread  out.  This 
alleviates  queuing  delay  and,  for  a  given  quality  level,  lowers  the  capacity  needed  for  each 
transmission.  We  denote  this  strategy  traffic  dispersion. 

In  this  paper,  we  focus  on  how  traffic  dispersion  affects  the  equivalent  capacity  needed  for  a 
transmission.  By  studying  its  behaviour,  we  can  determine  under  what  circumstances  spatial 
traffic  dispersion  is  motivated  for  different  cost  functions,  when  using  a  certain  number  of  paths 
in  the  network.  The  first  cost  function  is  a  fixed  charge  per  capacity  unit.  Next,  we  add  a  fixed 
charge  per  connection  to  the  previous  cost,  and  lastly,  we  let  the  charge  per  path  increase  pro¬ 
gressively.  Our  findings  show  that  spatial  traffic  dispersion  alleviates  the  most  troublesome  traf¬ 
fic  cases,  that  is,  those  with  a  high  peak-to-mean  ratio  and  those  with  a  high  peak-to-link  ratio. 
Furthermore,  the  cost  benefits  due  to  dispersion  seem  to  justify  the  extra  effort  needed  to  imple¬ 
ment  it. 

This  work  was  in  part  presented  at  the  IFIP  TC6  Third  Workshop  on  Performance  Modelling 
and  Evaluation  of  ATM  Networks,  Ilkley,  U.K.,  July  1995. 
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1  INTRODUCTION 

The  asynchronous  transfer  mode  (ATM)  is  the  network  architecture  that  the  International  Tel¬ 
ecommunication  Union  recommends  for  broadband  integrated  services  digital  networks.  Suc¬ 
cinctly  described,  the  mode  combines  the  circuit  switched  routing  of  telephone  systems  with  the 
statistical  multiplexing  of  packet  switching.  This  is  accomplished  by  establishing  a  connection 
(fixed  route)  through  the  network  before  accepting  any  traffic.  The  information  is  then  sent  over 
the  connection  in  53-octet  long  cells,  which  are  routed  according  to  address  information  con¬ 
tained  in  their  5-octet  headers. 

The  capacity  of  a  transmission  link  is  statistically  shared  among  the  connections  traversing 
it.  When  traffic  arrives  randomly,  the  capacity  offered  by  the  link  occasionally  becomes  insuf¬ 
ficient.  This  could  be  handled  by  buffering,  but  as  the  arrivals  come  in  longer  and  longer  bursts 
the  buffers  will  eventually  overflow  and  cells  will  be  lost. 

Earlier  studies  have  shown  that  the  probability  of  cell  loss  is  highly  dependent  on  the  corre¬ 
lation  in  the  multiplexed  traffic  stream,  Li  (1989).  For  a  given  connection,  the  correlation  can 
be  lowered  by  spreading  the  traffic  in  time,  so  called  shaping.  This  method  may  however  give 
rise  to  delays  too  large  to  be  tolerated  by  the  application.  So,  statistical  multiplexing  of  traffic 
streams  with  strong  correlation  would,  at  a  low  probability  of  cell  loss,  require  unreasonable  low 
utilization  of  the  network  resources  and  excessively  large  buffers. 

Ever  since  Maxemchuk’s  contribution,  Maxemchuk  (1975),  there  have  been  several  different 
suggestions  for  spreading  the  traffic  from  a  source  in  space  rather  than  in  time,  as  a  means  for 
load  balancing  and  fault  handling  in  packet-switched  networks,  Gustafsson  (1994:1).  Spatial 
traffic  dispersion  means  that  a  message  is  divided  into  a  number  of  sub-messages,  which  are 
transmitted  in  parallel  over  disjoint  paths  in  the  network,  as  shown  in  Figure  1 .  A  large  burst  of 
data  will  consequently  be  sent  as  more  moderately  sized  bursts,  and  the  correlation  will  be  re¬ 
duced  without  the  extra  delay  that  temporal  shaping  would  introduce. 


Figure  1  Illustration  of  spatial  traffic  dispersion. 

The  traffic  from  a  source  is  transmitted  in  parallel  through  the  network,  and  resequenced  at 
the  receiver.  Dispersion  should  be  possible  in  any  network  where  disjoint  paths  exist  between 
the  source  and  the  destination.  Given  a  number  of  such  paths,  the  traffic  may  be  spread  accord¬ 
ing  to  different  strategies.  One  possibility  is  to  spread  the  packets  in  the  traffic  stream  cyclically 
over  the  paths  -  a  solution  which  is  discussed  in  Lee  and  Liew  (1993),  Maxemchuk  (1993).  An¬ 
other  way  would  be  to  submit  the  packets  in  longer  sequences  on  each  path,  as  suggested  for  the 
string  mode  protocol,  Dejean  et  al.  (1991),  and  yet  another  solution  would  be  to  spread  the  traf¬ 
fic  dynamically  over  the  paths,  Cheng  (1994).  The  latter  variant  would  however  require  sub¬ 
stantially  more  overhead.  Essentially,  a  spreading  strategy  should  apply  to  the  traffic 
characteristics  in  order  to  minimize  the  correlation  in  the  resulting  traffic  streams,  since  lower¬ 
ing  the  correlation  is  one  of  the  main  advantages  of  traffic  dispersion. 

Spatial  traffic  dispersion  thus  improves  statistical  multiplexing  and  it  also  enhances  network 
security,  as  eavesdropping  on  several  connections  simultaneously  may  be  difficult.  Since  the 
dispersion  scheme  employs  disjoint  paths,  cell  loss  on  one  connection  is  independent  of  losses 
on  other  connections,  and  forward  error  correction  can  successfully  be  used  to  correct  the  losses. 
Regarding  these  advantages,  the  question  arises  whether  traffic  dispersion  is  useful  under  all  cir¬ 
cumstances. 
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To  answer  this  question,  or  at  least  give  a  hint,  we  have  chosen  to  focus  on  how  traffic  dis¬ 
persion  affects  the  equivalent  capacity  of  a  transmission.  The  equivalent  capacity  is  the  predict¬ 
ed  capacity,  that  for  certain  source  characteristics  and  demands  on  performance,  needs  to  be 
allocated  on  a  link.  We  investigate  for  what  values  of  source  peak  rate  and  source  mean  rate, 
and  their  relation  to  the  link  capacity,  spatial  traffic  dispersion  is  useful. 

Equivalent  capacity  is  discussed  in  Section  2,  while  Section  3  covers  the  cost  functions  used 
in  the  evaluations.  Our  results  are  presented  in  Section  4,  and  Section  5  concludes  the  paper. 


2  EQUIVALENT  CAPACITY 

2.1  Equivalent  capacity  without  buffering 

The  ATM  concept  makes  use  of  statistical  multiplexing  which  allows  multiple  sources  to  share 
a  link  statistically.  This  means  that  the  demand  for  capacity  at  times  may  exceed  the  available 
resource  on  the  link,  and  cells  will  be  lost.  Given  a  limit  on  the  cell-loss  probability,  we  can  cal¬ 
culate  the  maximum  number  of  identical  sources  n  which  can  be  multiplexed  on  a  link  of  ca¬ 
pacity  C.  The  equivalent  capacity  required  for  one  source  is  then  C/n. 

If  the  objective  is  to  maintain  the  traffic  arrival  rate  below  the  link  capacity,  that  is,  assuming 
no  buffering,  the  cell-loss  probability  may  be  approximated  by: 


E{(X-C)+} 

tp  =  - — — - -,  where  X  is  the  arrival  rate,  and  (\-C)  is  max  {0,  A.-C}.  (1) 

E  {K\ 

Define  the  arrival  process  to  consist  of  n  independent  identically  distributed  on-off  sources, 
each  with  peak  rate  h  (Figure  2).  The  model  is  chosen  to  get  a  tractable  expression  for  the  equiv¬ 
alent  capacity  while  capturing  some  of  the  burstiness  that  can  be  anticipated  from  future  traffic 
sources. 


Poff 


1  Pon 


Figure  2  The  state  diagram  of  an  on-off  source.  The  system  stays  off  with  probability  p0ee,  and 
once  on,  it  remains  on  with  probability  pon. 

While  in  active  state,  the  source  generates  traffic  at  peak  rate  h.  The  mean  rate  of  the  source 
is  thus  given  by  £h ,  where  £  is  the  fraction  of  time  that  the  source  spends  in  active  state: 

2  Pon  Poff 


(2) 
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As  the  number  of  simultaneously  active  sources  is  binomialiy  distributed,  the  cell-loss  prob¬ 
ability  is  given  by 


tp  =  — ^  (xh  -  C)  Pr  {x  sources  on}  = 

c  c 

x=h  x=h 

The  burstiness  of  a  source  is  in  these  equations  defined  as 

Source  peak  rate  _  h  _  1 
Source  mean  rate  E/j  £  ’ 


£*(!-£) 


(3) 


(4) 


and  the  cell-loss  probability  is  hence  dependent  on  the  ratio  C/h  as  well  as  on  the  source  bursti¬ 
ness. 

Since  the  correlation  between  cells  generated  by  the  source  in  Figure  2  is  monotonously  de¬ 
creasing,  cyclic  dispersion  would  minimize  the  correlation  on  each  path.  This  is  thus  the  disper¬ 
sion  strategy  preferred,  and  it  is  assumed  in  the  remainder  of  this  paper.  A  dispersed  source,  as 
the  link  experiences  it,  may  be  approximated  by  another  on-off  source  with  the  same  character¬ 
istics  except  that  the  peak  rate  is  reduced  to  h/N,  N  being  the  dispersion  factor.  The  dispersion 
factor  is  defined  as  the  number  of  paths  over  which  the  traffic  from  a  source  is  spread.  In  the 
following,  a  dispersed  source  represents  the  traffic  that  an  original  source  sends  over  one  of  the 
paths.  We  show  the  effects  of  dispersion  on  the  equivalent  capacity  by  replacing  each  original 
on-off  source  by  N  independent  dispersed  sources. 

Essentially,  what  we  do  is  modelling  an  on-off  source  with  peak  rate  h  as  N  on-off  sources, 
each  with  peak  rate  h/N  (Figure  3).  These  sources  would  be  completely  correlated,  since  they 
together  represent  the  original  source.  With  dispersion  however,  the  traffic  from  each  of  these 
N  sources  is  sent  over  a  separate  path,  disjoint  from  all  the  other  paths.  Each  link  is  therefore 
only  affected  by  the  traffic  from  one  of  the  dispersed  sources,  and  this  source  can  be  seen  as  the 
fraction  of  traffic  that  the  original  source  sends  over  that  specific  link.  In  order  to  obtain  the 
same  load  on  a  link  with  as  without  dispersion,  we  assume  that  the  link  instead  of  carrying  the 
traffic  from  a  number  of  independent  original  sources  now  carries  the  traffic  from  N  times  as 
many  independent  dispersed  sources.  That  is,  one  link  carries  fractions  of  the  traffic  from  each 
of  N  times  as  many  original  and  independent  sources.  This  justifies  the  independence  criterion 
used  in  the  capacity  calculations. 
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Figure  3  Modelling  dispersed  traffic  sources.  Without  dispersion,  a  number  of  sources  are 
multiplexed  on  a  certain  link,  while  in  the  case  of  dispersion,  N  times  as  many  dispersed 
sources  are  multiplexed  on  the  same  link.  The  amount  of  traffic  that  the  link  carries  is  hence 
kept  constant. 

We  can  now  calculate  the  equivalent  capacity  for  a  dispersed  source,  but  we  multiply  it  by  N 
in  order  to  get  a  fair  comparison  with  the  capacity  for  an  original  source.  From  (3),  we  get  the 
cell-loss  probability  for  n  sources,  each  with  peak  rate  h/N  as 


(5) 


For  a  given  cell-loss  probability,  the  capacity  required  for  each  source  on  each  path  is  C/n, 
and  the  total  capacity  required  for  a  source  is  NC/n.  This  is  the  capacity  presented  in  the  Figures. 

Figure  4  shows  for  each  value  of  TV  the  aggregated  equivalent  capacity  of  N  dispersed  sources. 
The  equivalent  capacity  for  one  source  without  dispersion  is  normalized  to  one.  Note  that  this 
implies  that  the  graphs  for  different  values  of  C/h  and  burstiness  are  not  directly  comparable. 
The  graphs  only  intend  to  show  the  multiplexing  gain  obtained  by  dispersion  for  different  values 
of  burstiness  and  source  peak  rate.  The  upper  left  graph  shows  a  small  increase  in  equivalent 
capacity  for  N=2  compared  to  N=  1 .  The  equivalent  capacity  for  a  dispersed  source  is  in  this  case 
lower  than  the  capacity  for  a  non-dispersed  source,  but  it  still  exceeds  fifty  percent  of  that  value. 
When  multiplied  by  two,  it  hence  causes  an  increase  in  equivalent  capacity.  It  should  be  noted 
that  since  the  peak-to-link  ratio  is  very  high,  only  a  few  sources  fit,  given  the  zero-buffer  as¬ 
sumption  and  the  stringent  loss  requirement  (1  O'9).  The  peak  in  the  graph  is  basically  due  to  the 
fact  that  the  number  of  sources  has  to  be  an  integer.  The  truncation  gives  proportionally  higher 
effect  in  this  case  since  the  number  of  multiplexed  sources  is  very  low  (6  in  the  case  without 
dispersion). 
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The  figure  shows  that  traffic  dispersion  decreases  the  equivalent  capacity  for  a  connection. 
When  dealing  with  statistical  multiplexing,  the  most  troublesome  traffic  sources  are  those  with 
a  high  peak-to-mean  ratio  and  those  with  a  high  peak-to-link  ratio.  This  is  because  it  becomes 
extremely  difficult  to  predict  the  amount  of  capacity  which  needs  to  be  allocated  in  order  to  ful¬ 
fil  the  performance  requirements  for  heavily  fluctuating  sources.  We  can  see  that  these  sources 
are  those  where  the  benefits  of  traffic  dispersion  are  most  significant. 


Peak/link=l/5  Peak/1  ink=  1/10  Peak/link=l/50  Peak/link=l/100 


Figure  4  Equivalent  capacity  for  different  degrees  of  dispersion,  and  different  values  of  source 
burstiness  and  peak-to-link  ratio.  The  burstiness  is  defined  as  the  source’s  peak  rate  divided  by 
its  mean  rate,  and  ‘Peak/link’  denotes  the  source  peak  rate  divided  by  the  link  capacity.  The 
cell-loss  probability  was  set  to  1CT9 


116 


Part  Two  Traffic  and  Congestion  Control 


In  Figure  4,  the  values  were  normalized.  Table  1  shows  the  equivalent  capacity  without  dis¬ 
persion  (N=  1)  before  normalization,  given  the  link  capacity  C=  1.  This  means  that  for  a  given 
column,  the  peak  rate  is  constant,  while  the  mean  rate  decreases  for  each  row  in  order  to  make 
the  source  more  bursty.  For  a  given  row,  the  peak  rate  as  well  as  the  mean  rate  decreases  for 
each  column,  to  make  the  source  less  dominant  on  the  link.  Table  2  shows  the  peak  rates  h  and 
mean  rates  £/t  corresponding  to  the  capacity  values  in  Table  1. 

A  comparison  of  the  two  tables  shows  that  the  worst  source  behaviour  is  in  the  lower  left  box, 
while  the  best  behaviour  is  in  the  upper  right  box.  In  the  lower  left  box,  the  source  burstiness  is 
high,  causing  large  fluctuations  in  the  traffic,  and  the  peak  rate  occupies  a  large  part  of  the  link 
capacity.  The  equivalent  capacity  for  such  a  source  is  close  to  the  peak  rate.  On  the  other  hand, 
the  source  in  the  upper  right  box  causes  small  fluctuations,  never  demanding  more  than  a  small 
fraction  of  the  link,  wherefore  the  equivalent  capacity  is  close  to  the  mean  rate. 


Table  1  Equivalent  capacity  before  normalization;  C=l,  N=\,  cell-loss  probability  10~9 
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Table  2  Source  mean  rate 
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It  might  also  be  interesting  to  study  the  influence  of  the  cell-loss  probability  on  the  results.  In 
the  calculations  discussed  above,  the  cell-loss  probability  was  kept  constant  and  equal  to  10"9. 
Figure  5  shows  that  if  we  increase  the  cell-loss  probability  to  10'3,  traffic  dispersion  still  reduces 
the  capacity  like  in  Figure  4,  but  not  to  the  same  extent  as  with  the  lower  cell-loss  probability. 
This  might  be  because  a  higher  tolerance  of  loss  allows  more  sources  to  be  multiplexed  on  the 
same  link.  The  increase  in  number  of  sources,  which  dispersion  makes  possible,  hence  becomes 
less  significant. 
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Peak/link=l/5  Peak/link=l/10  Peak/link=l/50 


Figure  5  Equivalent  capacity  for  some  different  values  on  source  burstiness  and  peak-to-link 
ratio.  The  cell-loss  probability  was  set  to  10"3. 


2.2  Equivalent  capacity  with  buffering 


The  discussion  so  far  has  been  for  a  zero-buffer  assumption,  but  it  may  also  be  of  interest  to  look 
at  the  case  where  a  single  on-off  source  generates  input  traffic  to  a  link  with  buffer  capacity  X. 
Guerin  et  al.  give  an  upper  bound  on  the  equivalent  capacity  c  of  such  a  connection,  Guerin  et 
al.  (1991): 


c  =  h  y~X+  ^7  X)  2  ,  where  y  =  Ton  ( 1  -E)  h  ■  In-*  (6) 

2y  9 

Ton  is  the  average  duration  of  an  active  period,  cp  is  the  probability  of  buffer  overflow  (cell 
loss), "and  h  and  £  are  defined  as  before.  Since  this  case  concerns  only  a  single  source,  it  might 
not  be  suitable  to  model  dispersion  as  in  the  previous  section,  where  each  original  source  was 
replaced  by  a  number  of  sources  with  lower  peak  rates.  We  therefore  choose  to  model  a  dis¬ 
persed  source  by  an  on-off  source  with  a  fixed  mean  but  with  a  correlation  function  which 
changes  with  the  dispersion  factor. 

Recall  the  on-off  source  from  Figure  2.  Define  u(i)  to  be  the  number  of  cells  generated  within 
the  2th  time  unit.  This  means  that  u(i)  is  either  0  or  h.  The  correlation  sequence  of  the  source  is 
given  by 


2.2 


r(k)  =  E{u{i  +  k)u{i)}  -  h£ 
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'poff 


(Pon+Poff-V 


V 


y 


(7) 


118 


Part  Two  Traffic  and  Congestion  Control 


The  more  correlated  the  traffic  is,  the  more  difficult  it  becomes  to  handle.  When  dispersing 
the  traffic,  the  objective  is  therefore  to  minimize  the  correlation  in  the  cell  stream  on  each  path. 
As  the  correlation  sequence  of  an  on-off  source  is  monotonously  decreasing,  the  minimization 
is  obtained  by  distributing  the  generated  cells  cyclically  over  the  paths,  as  mentioned  before 
(Figure  6). 


Figure  6  Dispersing  the  cells  cyclically  over  A  disjoint  paths. 

The  correlation  sequence  between  the  cells  on  one  of  the  paths  is  hence  given  by  Gustafsson 
(1994:2) 

rd(k)  =  E  {u  (iN  +  kN)  u  ( iN)  }  =  r  (kN)  .  (8) 


In  order  to  study  the  behaviour  of  the  equivalent  capacity  for  different  degrees  of  dispersion, 
we  model  the  traffic  from  a  dispersed  source  on  a  certain  path.  This  is  achieved  by  keeping  the 
peak  and  mean  rates  of  an  on-off  source  constant,  while  reducing  the  correlation  according  to 
(8).  The  amount  of  traffic  transmitted  on  a  link  during  a  certain  time  interval  hereby  remains 
unaltered,  while  the  traffic  behaviour  varies  under  the  influence  of  dispersion. 

The  fraction  of  time  that  the  source  spends  in  active  state  can  be  written  as 
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and  keeping  £  constant  thus  means  keeping 
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The  only  part,  of  r(k)  varying  when  the  source  peak  and  mean  rates  are  fixed,  is  hence 
(Pon+PofrX)  ■ 

Given  the  transition  probabilities  for  a  non-dispersed  source  (N=  1),  we  can  calculate  the 
probabilities  for  dispersion  factor  N  by 
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(11) 


By  adjusting  the  source  characteristics  according  to  (10)  and  (11),  we  show  the  effects  of  dis¬ 
persion  on  the  equivalent  capacity  from  (6).  We  have  chosen  a  numerical  example  with  an  on- 
off  source  whose  Ton  is  about  200  time  units. 
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Previous  results  have  shown  that  the  queue  size  is  highly  dependent  on  the  correlation  in  the 
traffic,  Li  and  Mark  (1990).  In  order  to  facilitate  a  comparison  among  the  different  graphs,  we 
therefore  keep  the  correlation  fixed  in  all  cases  where  there  is  no  dispersion,  that  is,  the  first  bar 
in  each  graph.  This  means  that  the  value  of  Ton  is  given  by 

Ton  =  1  ~  >  (12) 


will  vary,  because  changing  the  burstiness  while  keeping  the  correlation  fixed  for  N=  1,  means 
that  the  value  of  pon  will  change  as  well. 

We  try  to  make  the  comparison  among  different  traffic  cases  as  fair  as  possible.  One  solution 
would  be  to  keep  the  peak  rate  of  the  source  constant.  This  means  that  the  mean  rate  must  be 
varied  when  the  burstiness  varies.  The  results  obtained  on  these  conditions  are  shown  in  Figure 
7,  and  the  equivalent  capacity  for  N=l  is  normalized  to  be  one.  The  peak  rate  is  constant  and  set 
to  100,  and  tp  is  10'9.  Since  the  peak-to-link  ratio  is  not  considered  in  (6),  the  graphs  in  Figure 
7  are  not  directly  comparable  to  those  in  Figure  4.  Additional  results,  which  are  not  presented 
here,  shows  that  the  equivalent  capacity  behaves  similarly  if  the  mean  rate  is  kept  constant  and 
the  peak  rate  is  changed  instead,  Gustafsson  (1995). 


Burstiness  5  Burstiness  10  Burstiness  50  Burstiness  100 


Figure  7  Equivalent  capacity  for  different  degrees  of  dispersion  and  burstiness,  for  three 
different  buffer  sizes.  The  source  peak  rate  is  constant  and  set  to  100. 
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Table  3  shows  the  values  from  Figure  7  before  normalization,  for  the  cases  without  disper¬ 
sion. 


Table  3  Equivalent  capacity  before  normalization;  N=  1,  /r=100 


Buffer  size 

Burstiness  5 
Mean  rate  20 
Ton=248 

Burstiness  10 
Mean  rate  10 
Ton -220 

Burstiness  50 
Mean  rate  2 

Ton  -202 

Burstiness  100 
Mean  rate  1 

Ton  =200 

103 

99.8 

99.8 

99.8 

99.8 

105 

77.4 

76.5 

75.8 

75.7 

107 

20.5 

10.3 

2.01 

1.04 

The  results  above  show  that  when  the  buffer  size  is  extremely  small,  the  equivalent  capacity 
for  a  connection  approaches  the  source  peak  rate.  Because  of  the  buffer  limitation,  it  becomes 
difficult  not  to  exceed  the  allowed  probability  of  overflow,  even  when  dispersion  is  used.  In  the 
example  above,  the  average  burst  size  is  about  20  000  cells,  while  the  buffer  size  is  only  1000. 
When  a  burst  arrives,  it  hence  fills  up  the  buffer  rather  fast,  and  in  order  to  keep  the  cell  loss  at 
a  low  level,  the  output  rate  of  the  buffer  must  be  very  high.  By  increasing  the  dispersion  factor 
further,  we  can  reduce  the  average  burst  size  far  below  the  buffer  size,  to  a  point  where  the  traf¬ 
fic  is  almost  completely  uncorrelated,  but  it  turns  out  that  the  decrease  in  equivalent  capacity 
stays  at  about  45-50%.  An  explanation  for  this  might  be  that  the  formula  only  considers  a  single 
source.  There  are  hence  no  capacity  gains  due  to  the  effects  of  multiplexing,  as  can  be  obtained 
when  several  sources  share  a  link.  This  might  also  explain  why  the  capacity  reductions  are  not 
similar  to  the  ones  obtained  without  buffering,  since  in  that  case,  multiple  sources  were  multi¬ 
plexed  together  on  a  link.  In  summary,  with  one  single  source  and  a  small  buffer,  dispersion 
cannot  significantly  improve  the  situation,  at  least  not  for  a  dispersion  factor  smaller  than  ten. 

If  on  the  contrary  the  buffer  size  is  extremely  large,  the  equivalent  capacity  approaches  the 
source  mean  rate.  Since  the  capacity  can  never  be  lower  than  the  mean  rate,  dispersion  is  of  very 
little  help  in  this  case.  It  should  be  noted  however,  that  such  low  capacity  values  can  be  obtained 
because  the  buffer  is  large  enough  to  hold  entire  bursts.  The  penalty  for  this  is  long  delays. 

When  the  buffer  size  lies  somewhere  between  these  two  extremes,  traffic  dispersion  is  useful, 
and  the  equivalent  capacity  under  the  influence  of  dispersion  follows  the  same  tendency  with  as 
without  buffering.  That  is,  as  the  source  burstiness  increases,  the  gain  obtained  by  dispersion 
increases  too. 

Next,  we  consider  the  equivalent  capacity  of  a  number  of  connections,  which  are  multiplexed 
on  the  same  link.  The  capacity  could  be  approximated  by  the  sum  of  the  individual  capacities, 
that  is 


n 

c  =  £c,..  (13) 

i  =  1 

Unless  the  equivalent  capacity  of  each  individual  connection  is  very  close  to  the  source  mean 
rate,  the  capacity  according  to  (13)  in  many  cases  overestimates  what  actually  needs  to  be  allo¬ 
cated.  This  is  because  the  method  does  not  consider  the  effects  of  statistical  multiplexing. 
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Guerin  et  al.  (1991)  present  the  following  approximation  for  the  equivalent  capacity  of  n  mul¬ 
tiplexed  on-off  sources: 


(14) 


The  term  n  ■  Eh  denotes  the  mean  aggregate  bit  rate  of  the  connections,  and  c  is  the  standard 
deviation  of  the  aggregate  bit  rate,  that  is 


n 


cT  =  of  =  nof  =  n  ■  h"E  ( 1  -  £)  . 


(15) 


i  =  1 


As  long  as  we  keep  the  source  peak  and  mean  rates  constant,  the  first  part  of  (14)  will  not  be 
affected  by  dispersion.  In  order  to  investigate  the  effects  of  dispersion  on  the  equivalent  capac¬ 
ity  in  (14),  we  therefore  recall  the  model  of  a  dispersed  source  which  was  used  in  Section  2.1. 
By  replacing  each  original  source  with  peak  rate  h  by  N  sources,  each  with  peak  rate  h/N,  the 
first  part  of  ( 14)  becomes 

N  ■  nE  ■  j  +  oj-  2 In tp  —  In  (2;t)  =  n  ■  Eh  +  oj-  21ntp  -  In  (2n)  ,  where  (16) 


(17) 


Figure  8  shows  how  dispersion  affects  the  equivalent  capacity  of  n  multiplexed  connections, 
according  to  (14).  The  results  presented  are  for  n=10,  100,  1000,  and  the  values  are  normalized 
to  be  one  for  N=  1.  We  have  chosen  the  buffer  size  X=100  000,  since  this  was  the  case  where 
dispersion  made  significant  difference  to  the  results  in  the  previous  discussion.  Further  results, 
which  are  not  shown  here,  indicate  that  we  get  a  capacity  reduction  as  well  when  reducing  the 
buffer  size  to  about  1000,  even  though  the  reduction  in  that  case  is  slightly  smaller  than  with  a 
larger  buffer. 
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Figure  8  Equivalent  capacity  for  different  degrees  of  dispersion  and  burstiness.  The  peak  rate 
of  a  source  is  constant  and  set  to  100,  the  probability  of  buffer  overflow  tp  is  10  ,  and  the 
buffer  size  is  X=100  000. 

Table  4  shows  the  equivalent  capacity  from  Figure  8  before  normalization  for  N=\.  In  the  ta¬ 
ble,  we  show  the  aggregate  equivalent  capacity  for  n  sources,  divided  by  the  number  of  sources, 
n.  The  values  in  the  table  can  therefore  be  seen  as  the  equivalent  capacity  of  one  of  the  n  sources. 


Table  4  Equivalent  capacity  before  normalization;  N=  1 ,  /i=100 


Number  of 
sources,  n 

Burstiness  5 

Burstiness  10 

Burstiness  50 

Burstiness  100 

10 

77.5 

69.7 

29.9 

20.8 

100 

45.2 

28.9 

10.8 

7.26 

1000 

28.0 

16.0 

4.79 

2.98 
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Comparing  these  results  to  those  in  Figure  7  and  Table  3,  we  find  that  (14)  gives  significantly 
lower  capacity  values  than  (13)  for  a  large  number  of  sources  with  high  burstiness.  This  is  the 
situation  where  the  effects  of  statistical  multiplexing  show. 

Furthermore,  dispersion  causes  larger  capacity  reductions  in  the  case  with  a  high  source 
burstiness,  and  a  small  number  of  sources  which  are  multiplexed  together.  The  similarity  be¬ 
tween  this  behaviour  and  the  one  that  appeared  in  Figure  4  is  striking.  An  increasing  number  of 
sources  (larger  n)  means  that  the  ratio  between  the  source  peak  rate,  which  in  this  case  is  con¬ 
stant,  and  the  aggregate  equivalent  capacity  C  decreases.  If  we  let  this  ratio  correspond  to  the 
peak-to-link  ratio  in  Figure  4,  we  get  exactly  the  same  tendency  with  as  without  buffering.  This 
means  that  we  can  make  the  general  conclusion  that  dispersion  improves  the  equivalent  capacity 
particularly  in  the  case  of  a  small  number  of  sources  (high  peak-to-link  ratio)  with  high  bursti¬ 
ness  (high  peak-to-mean  ratio). 

The  results  presented  thus  show  that  for  a  suitable  buffer  size,  traffic  dispersion  reduces  the 
equivalent  capacity  for  a  connection.  For  very  large  buffers  dispersion  does  not  affect  the  equiv¬ 
alent  capacity,  but  will  probably  reduce  the  delay,  and  for  very  small  buffers  dispersion  over  a 
modest  number  of  paths  cannot  improve  the  situation,  unless  there  are  enough  sources  to  obtain 
multiplexing  effects.  When  there  is  a  capacity  reduction,  it  behaves  similarly  with  as  without 
buffering.  We  have  therefore  chosen  to  limit  the  following  discussions  to  the  results  without 
buffering,  since  they  seem  to  represent  a  general  behaviour. 


3  COST  FUNCTIONS 

The  previous  section  showed  that  a  drastic  decrease  in  equivalent  capacity  owing  to  traffic  dis¬ 
persion  is  possible  for  bursty  sources.  Considering  only  the  equivalent  capacity  might  however 
be  somewhat  optimistic,  since  spreading  the  traffic  over  several  paths  requires  more  virtual  cir¬ 
cuits  to  be  established,  and  causes  additional  signalling  overhead.  We  will  therefore  weigh  the 
capacity  obtained  without  buffering  with  three  different  cost  functions,  in  order  to  establish  un¬ 
der  what  circumstances  traffic  dispersion  is  profitable. 

The  first  cost  function  is  a  fixed  charge  per  capacity  unit  (Figure  9  (a)).  This  cost  is  independ¬ 
ent  of  the  number  of  connections  used  for  a  transmission,  and  the  cost  benefit  curve  will  follow 
the  curves  in  Figure  4,  scaled  by  a  constant  cost  factor. 


Figure  9  The  different  costs  considered:  a  fixed  cost  per  capacity  unit  (a),  a  fixed  cost  per 
connection  (b),  and  a  cost  increasing  with  the  number  of  connections  (c). 


Next,  we  consider  a  cost  function  which  is  composed  of  a  fixed  charge  per  capacity  unit,  and 
a  fixed  charge  per  connection  used  for  a  transmission  (Figure  9  (a)  and  (b)).  The  connection 
charge  is  motivated  by  the  extra  effort  needed  to  set  up  and  maintain  several  virtual  circuits  for 
each  transmission. 
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The  last  cost  function  is  composed  of  a  fixed  charge  per  capacity  unit,  and  a  progressively 
increasing  charge  per  number  of  connections  used  (Figure  9  (a)  and  (c)).  Assume  that  without 
dispersion,  the  virtual  circuit  follows  the  shortest  path  through  the  network.  Since  there  might 
only  be  one  path  of  that  length,  the  additional  connections  needed  for  dispersion  will  have  to 
follow  longer  paths.  The  cost  increase  could  therefore  be  taken  as  a  penalty  for  using  longer  and 
longer  paths. 


4  WHEN  IS  TRAFFIC  DISPERSION  USEFUL? 

We  relate  the  different  cost  functions  to  the  values  of  equivalent  capacity  that  we  obtained  in 
Section  2.1,  to  see  whether  dispersion  is  always  motivated.  If  we  only  consider  a  fixed  charge 
per  capacity  unit  and  assume  that  there  is  no  extra  cost  for  using  several  connections  (the  first 
cost  function),  traffic  dispersion  is  practically  always  profitable,  and  the  more  paths  used  the 
better.  For  sources  with  a  high  burstiness  and  a  high  peak-to-link  ratio,  the  benefits  of  dispersion 
are  obvious;  by  spreading  the  traffic  over  only  a  handful  of  paths,  a  cost  benefit  of  about  eighty 
to  ninety  percent  is  obtained.  Regarding  the  sources  with  a  low  peak-to-link  ratio,  the  benefits 
are  not  that  large.  The  gain  here  is  only  about  thirty  percent.  Flowever,  when  considering  the 
values  of  equivalent  capacity  without  normalization,  in  Table  1,  we  find  that  the  cases  where 
the  benefits  of  dispersion  are  least  significant,  are  those  where  the  cost  without  dispersion  is  al¬ 
ready  very  low.  Any  larger  gain  would  therefore  in  real  values  be  negligible  in  comparison  to 
the  other  cases.  In  other  words:  when  the  gain  is  needed,  it  is  high. 

With  this  cost  situation,  traffic  dispersion  over  many  paths  is  consequently  always  the  best 
solution.  The  assumption  of  no  extra  cost  for  extra  paths  may  however  not  be  quite  realistic, 
wherefore  we  move  on  to  the  next  cost  function. 

In  this  case,  we  have  a  fixed  charge  per  capacity  unit  and  a  fixed  charge  per  connection.  The 
balance  between  these  two  charges  is  very  important.  If  the  charge  per  link  is  extremely  small, 
the  results  are  the  same  as  above,  which  means  that  dispersion  is  always  profitable.  If,  on  the 
other  hand,  the  charge  per  link  is  too  large,  it  will  dominate  the  total  cost  and  result  in  a  cost 
function  which  is  linearly  increasing  with  the  number  of  paths.  Traffic  dispersion  would  hence 
never  be  justified. 

More  realistically,  the  charge  per  capacity  unit  will  form  the  major  part  of  the  cost.  In  our  cal¬ 
culations,  we  have  chosen  the  cost  1  per  capacity  unit  and  0.001  per  connection,  as  a  hopefully 
reasonable  proportion.  These  values  apply  to  a  time  unit  equal  to  one  call.  Figure  10  shows  the 
result  from  applying  such  a  cost  function  on  some  of  the  equivalent  capacity  values  from  Figure 
4,  without  normalization. 


When  is  traffic  dispersion  useful? 
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Figure  10  The  second  cost  function  related  to  the  equivalent  capacity  values  from  Figure  4. 
The  cost  is  1  per  capacity  unit  and  0.001  per  connection. 


These  results  show  again  that  traffic  dispersion  is  very  profitable  in  the  cases  where  the  peak- 
to-link  ratio  is  high.  The  maximum  gain  is  about  eighty  percent  for  sources  with  high  burstiness. 
As  the  peak-to-link  ratio  decreases,  so  does  the  gain,  and  using  a  larger  number  of  connections 
even  causes  a  small  cost  increase. 
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The  conclusion  is  that  in  cases  which  can  be  handled  well  without  dispersion,  it  should  not 
be  used.  It  is  then  better  to  allocate  resources  in  a  more  conventional  manner,  using  only  one 
path  for  each  transmission.  In  the  cases  where  traffic  dispersion  does  give  benefits,  it  should  of 
course  be  used.  The  results  show  that  spreading  the  traffic  over  more  than  about  two  to  five 
paths  does  not  give  any  remarkable  further  benefit,  whereas  the  number  of  paths  should  prefer¬ 
ably  be  kept  to  about  this  size. 

With  the  chosen  proportions  on  the  cost  function,  the  increased  cost  caused  by  several  con¬ 
nections  is  however  practically  negligible  compared  to  the  gain  in  cost  obtained  on  other  con¬ 
ditions.  In  hesitation  of  whether  dispersion  should  be  used  or  not,  it  therefore  seems  better  to 
use  it,  since  the  penalty  for  dispersing  when  unnecessary  is  very  small  compared  to  the  gain  ob¬ 
tained  when  dispersion  turns  out  to  be  needed.  In  essence,  the  benefits  from  using  dispersion  in 
the  right  place  are  many  times  larger  than  the  penalty  for  using  it  in  the  wrong  place. 

The  last  cost  function  is  a  fixed  charge  per  capacity  unit  and  a  charge  which  increases  with 
the  number  of  connections.  The  behaviour  is  as  the  one  we  described  previously,  namely  the 
charge  for  using  several  paths  soon  dominates  the  total  cost,  and  a  transmission  will  quickly  be¬ 
come  rather  expensive.  This  is  shown  in  Figure  1 1 .  In  this  example,  considerable  benefits  are 
still  obtained  for  the  higher  peak-to-link  ratios,  but  in  the  other  cases  there  is  no  gain  at  all.  The 
best  strategy  under  these  circumstances  seems  to  be  to  disperse  the  traffic  sparingly,  and  only 
when  an  economic  gain  is  guaranteed. 
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Figure  11  The  third  cost  function  related  to  the  equivalent  capacity  values  from  Figure  4. 

So  when  is  traffic  dispersion  useful?  On  the  condition  that  the  penalty  for  using  several  con¬ 
nections  does  not  dominate  the  total  cost  for  a  transmission,  dispersion  over  a  moderate  number 
of  paths  is  practically  always  profitable.  The  most  troublesome  traffic  sources  to  be  handled  by 
statistical  multiplexing  -  that  is,  as  mentioned  before,  those  with  a  high  burstiness  and  a  high 
peak-to-link  ratio  -  are  those  for  which  the  highest  gains  can  be  made. 
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ATM  Forum  has  defined  three  traffic  parameters,  namely  sustained  bit  rate,  peak  bit  rate  and 
burst  size.  As  a  rough  estimation,  dispersion  should  be  employed  when  the  relation  between  the 
peak  bit  rate  and  the  sustained  bit  rate  (the  source  burstiness)  is  in  the  order  of  ten  or  more,  and 
when  the  peak  bit  rate  exceeds  one  tenth  of  the  link  capacity.  In  all  cases,  the  number  of  paths 
used  for  a  transmission  should  stay  somewhere  between  two  and  five.  The  burst  size  is  not  di¬ 
rectly  considered  in  this  paper.  It  is  true  that  the  average  duration  of  an  active  period  is  obtained 
directly  from  the  transition  probabilities  of  an  on-off  source,  and  so  is  the  probability  that  the 
source  is  in  active  state.  We  have  however  only  considered  the  probability  of  being  in  active 
state,  and  it  is  possible  to  change  that  probability  without  affecting  the  duration  of  the  active 
period.  However,  longer  bursts  (longer  active  periods)  indicate  higher  correlation  in  the  traffic, 
and  this  is  where  the  benefits  from  dispersion  are  indisputable,  Gustafsson  (1994:2). 

When  dealing  with  sources  having  low  peak  bit  rate  compared  to  the  sustained  bit  rate,  that 
is,  less  than  a  ratio  ten  to  one,  and  a  link  capacity  above  ten  times  the  peak  bit  rate,  we  might 
just  as  well  do  without  dispersion.  The  penalty  for  using  dispersion  in  vain  is  however  not  dra¬ 
matic,  and  dispersion  may  be  applied  in  uncertain  cases.  Lastly,  it  should  be  noted  that  users 
may  want  to  pay  extra  to  get  the  traffic  dispersed  for  reasons  of  security,  or  other  reasons  that 
are  not  contained  in  the  results  presented  in  this  paper. 


5  CONCLUSIONS 

This  paper  presents  spatial  traffic  dispersion  as  a  means  for  handling  difficult  traffic  sources, 
and  facilitating  resource  allocation.  The  use  of  dispersion  shows  a  large  gain  in  the  equivalent 
capacity,  and  when  relating  the  capacity  to  three  different  cost  functions,  the  benefits  are  in  most 
cases  confirmed. 

From  the  results  presented,  we  conclude  that  a  profit  due  to  dispersion  is  practically  always 
possible.  In  the  case  of  a  single  traffic  source,  there  are  no  multiplexing  effects.  On  the  one  hand, 
a  small  buffer  may  limit  the  gain  in  equivalent  capacity  to  an  extent  where  dispersion  over  a 
modest  number  of  paths  cannot  improve  the  values.  On  the  other  hand,  a  large  buffer  makes  the 
equivalent  capacity  close  to  the  mean  without  dispersion,  at  the  expense  of  long  delays.  In  this 
case,  dispersion  could  probably  reduce  the  delay,  but  it  is  not  reflected  in  the  capacity  results. 

Furthermore,  we  conclude  that  the  cost  benefits  from  using  dispersion  are  most  important  for 
sources  with  a  high  peak-to-mean  ratio  (larger  than  ten),  and  a  high  peak-to-link  ratio  (larger 
than  one  tenth).  This  is  on  condition  that  the  charge  per  capacity  unit  dominates  over  the  cost 
for  using  several  connections.  The  penalty  for  using  dispersion  when  not  necessary  turns  out  to 
be  small  compared  to  the  benefits  from  using  dispersion  where  it  is  really  needed.  Traffic  dis¬ 
persion  is  therefore  useful  in  all  cases  where  its  benefits  are  beyond  all  doubt,  as  well  as  in  all 
uncertain  cases. 

We  may  also  change  our  viewpoint  from  the  user  to  the  network  operator.  If  a  tariff  structure 
is  imposed  that  erroneously  penalizes  traffic  dispersion,  statistical  multiplexing  may  not  be  used 
to  its  full  potential  in  the  network.  For  a  user  it  namely  means  a  charge  according  to  behaviour 
and  not  to  average  use,  since  the  equivalent  capacity  of  the  transmission  is  strongly  dependent 
on  the  burstiness  of  the  source.  The  consequence  is  that  the  user  may  choose  to  keep  the  con¬ 
nection  for  a  shorter  time,  and  set  it  up  for  individual  bursts.  The  operator  thus  has  a  situation 
with  low  sharing  of  resources  (fewer  paying  users  simultaneously  connected)  and  with  substan¬ 
tially  more  connection  requests.  As  our  example  of  equivalent  capacity  shows,  traffic  dispersion 
basically  makes  the  statistical  link  sharing  immune  to  source  behaviour.  We  therefore  hope  that 
it  will  be  widely  employed  as  an  antidote  to  new  bursty  traffic  sources. 
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Abstract 

Network  restoration  techniques  will  be  vital  to  ensure  B-ISDN  service  survivability  in 
the  event  of  high  capacity  link  and  node  failures.  Reliable  ATM  crossconnect  networks 
can  be  implemented  by  the  strategic  pre-assignment  of  protection  Virtual  Path  (VP) 
routes  to  permit  recovery  from  a  realistic  subset  of  all  possible  failures,  eg  single  span 
failures.  The  method  of  protection  route  assignment  influences  the  quantity  of  redundant 
resources  like  spare  capacity  and  Virtual  Path  Identifiers  (VPIs),  whilst  nodal  hardware 
costs  are  incurred  due  to  the  requirement  of  pre-stored  alternate  routing  information.  In 
addition  to  implementation  costs,  the  impact  that  the  choice  of  rerouting  scheme  has  on 
other  factors  must  be  considered.  For  example,  the  degree  of  path  elongation  following 
restoration  may  adversely  affect  the  delay  performance  of  certain  connections.  Also,  the 
amount  of  computation  required  to  design  the  protection  routes,  and  the  effort  needed 
to  activate  such  routes  have  to  be  taken  into  account.  This  paper  formulates  metrics  to 
facilitate  a  comparative  evaluation  of  four  distinct  routing  strategies  for  VP  restoration, 
and  in  conjunction  with  a  discussion  of  qualitative  properties  of  each  scheme,  it  concludes 
that  failure  independent  rerouting  is  the  preferred  approach. 

Keywords 

ATM  Virtual  Paths,  restoration,  routing,  survivable  network  design 
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1  INTRODUCTION 

Because  the  potential  repercussions  of  a  cable  break  or  node  failure  in  a  high  capacity- 
broadband  trunk  network  are  so  great,  survivability  is  crucial  (Wu,  1992).  Restoration 
is  the  process  of  re-establishing  trunk  groups  affected  by  a  failure  by  exploiting  spare 
capacity  at  diverse  locations  in  a  mesh  topology  (Veitch  et  al,  1995b).  This  is  realised 
by  high  speed  Digital  Crossconnect  Systems  (DCSs)  which  are  managed  centrally,  but 
also  have  the  capability  to  interact  in  a  distributed  fashion,  enabling  fast  restoration.  If 
restoration  is  rapid  enough,  active  calls  may  not  be  dropped.  Indeed,  a  target  completion 
time  of  2  seconds  would  ensure  preservation  of  the  majority  of  voice  connections  (Sosnosky, 
1994).  Recent  research  into  ATM  Virtual  Path  (VP)  restoration  suggests  that  progress 
can  be  made  in  achieving  very  fast  service  recovery  (Kawamura  et  al,  1994,  Anderson  et 
al,  1994,  Veitch  et  al,  1995c).  This  is  largely  attributed  to  the  logical  nature  of  a  Virtual 
Path  which  decouples  routing  and  capacity  assignment  making  reconfiguration  simple 
compared  with  Synchronous  Transfer  Mode  (STM)  paths  (Sato  et  al,  1990).  This  paper 
focuses  on  VP  restoration  in  ATM  networks,  and  in  particular,  the  range  of  approaches 
to  pre-planning  alternate  routes  for  this  purpose. 

From  a  network  operator’s  point  of  view,  a  restoration  strategy  should  be  simple  to 
implement,  and  resource  efficient.  In  tandem  with  these  requirements,  the  scheme  should 
offer  the  subscriber  fast  service  recovery  from  a  wide  range  of  failures.  A  suitable  approach 
therefore,  is  to  pre-assign  restoration  paths  in  advance  of  failure  occurrence.  This  can  be 
performed  by  a  centralised  computer  with  a  global  view  of  the  network;  resources  can 
be  managed  efficiently  and  an  appropriate  subset  of  failures  can  be  selected  as  the  basis 
for  protection.  In  the  event  of  a  failure,  distributed  signalling  between  crossconnects  can 
be  used  to  achieve  very  fast  restoration  with  a  simple  protocol  since  it  is  only  necessary 
to  activate  pre-determined  routes.  Two  distinct  methods  of  establishing  protection  VP 
routes  which  have  been  identified  in  the  literature  are  categorised  as  failure  dependent 
(Anderson  et  al,  1994)  and  failure  independent  (Kawamura  et  al,  1994)  rerouting,  both 
of  which  are  defined  later.  Although  both  of  these  techniques  constitute  pre-assigned 
VP  restoration,  they  are  fundamentally  different  in  certain  aspects  of  implementation 
and  performance,  hence  it  is  vital  to  perform  a  formal  comparison.  We  focus  on  single 
span  failure  which  is  the  simultaneous  failure  of  all  the  transmission  systems  between  two 
crossconnect  nodes.  This  assumption  facilitates  a  fair  comparison  of  schemes,  since  the 
description  of  one  of  the  two  paradigms  studied  accounts  for  span  failures  only  (Anderson 
et  al,  1994). 

The  costs  of  implementing  a  particular  restoration  system  are  affected  by  the  required 
spare  resources  such  as  link/buffer  capacity  and  Virtual  Path  Identifiers  (VPIs),  as  well 
as  the  memory  overheads  required  to  support  the  pre-storage  of  alternate  routing  in¬ 
formation.  In  addition,  different  rerouting  techniques  can  be  assessed  in  terms  of  the 
computational  effort  required  to  design  the  protection  plans,  as  well  as  the  signalling 
effort  needed  to  activate  protection  routes.  With  respect  to  performance,  the  choice  of 
rerouting  strategy  affects  the  user-perceived  quality  of  service,  since  restoration  often 
induces  path  elongation,  causing  increased  delays  and  cell  delay  variation.  Following  a 
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simple  description  of  the  alternative  rerouting  schemes  in  section  2,  a  comparative  eval¬ 
uation  will  be  carried  out  in  section  3  using  metrics  based  on  required  spare  capacity, 
VPI  redundancy,  path  length  elongation,  storage  overheads  and  the  computational  effort 
ascribed  to  protection  route  design.  Section  4  discusses  other  qualitative  factors  that  can 
be  employed  to  compare  the  two  distinct  rerouting  paradigms,  including  the  signalling 
protocol  and  robustness  in  the  presence  of  uncertainty.  Section  5  concludes  the  paper 
by  reasoning  in  favour  of  one  particular  method  by  taking  into  account  all  the  quantified 
metrics  of  section  3  as  well  as  the  implementation  aspects  considered  in  section  4. 


2  ALTERNATE  VP  REROUTING  SCHEMES 


Prior  to  providing  a  comparative  evaluation,  three  failure  dependent  rerouting  policies 
will  be  described,  followed  by  an  overview  of  the  failure  independent  rerouting  algorithm. 
Throughout,  it  is  assumed  that  the  working  VP  configuration  is  known  a  priori  and  that 
single  span  failure  protection  is  required. 


2.1  Failure  dependent  approaches 

With  failure  dependent  rerouting,  each  possible  single  span  failure  is  examined  in  turn, 
and  alternative  routes  are  subsequently  found  for  all  the  VPs  affected  by  the  failure.  A 
batch  alternate  route  planning  operation  of  this  kind  may  be  written: 

For  each  possible  span  failure 
For  each  failed  VP 

Find  alternate  path  according  to  rerouting  policy 
End  For 
End  For 

Hence,  there  is  a  unique  reconfiguration  associated  with  each  failure.  Alternate  routing 
data  in  the  form  of  VPI  and  link  ID  information  are  stored  in  databases  of  all  the  relevant 
crossconnect  nodes.  When  a  span  fails,  the  ID  of  the  failed  span  is  broadcast  to  all 
network  nodes  which  re-load  their  lookup  tables  with  the  relevant  data,  resulting  in  an 
asynchronous  logical  (i.e.  VP)  topology  update  (Anderson  et  al,  1994).  Three  separate 
versions  of  failure  dependent  rerouting  will  now  be  explained. 


2.1.1  Local  rerouting 

In  this  scheme,  when  a  span  fails,  all  the  affected  VPs  are  rerouted  between  the  terminat¬ 
ing  nodes  of  the  span,  without  regard  to  the  source  and  destination  of  the  VPs  (Figure  1 
(a)). 
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VP  restored  with  local  rerouting 

(b)  backhauling 


Figure  1:  Characteristics  of  local  rerouting 


Although  extremely  simple  to  compute  the  alternate  routes,  and  execution  of  restora¬ 
tion  is  potentially  fast  due  to  the  majority  of  message  processing  being  carried  out  in 
the  vicinity  of  the  failure,  this  is  a  greedy  algorithm  and  can  cause  the  undesirable  phe¬ 
nomenon  known  as  backhauling  (Figure  1  (b)).  From  the  figure,  failure  of  span  1-2  leads 
to  an  alternate  route  being  computed  between  nodes  1  and  2  as  1-3-4-2.  The  failed  path 
1-2-4  consequently  uses  the  route  1-3-4-2-4  meaning  span  2-4  is  utilised  twice. 


2.1.2  Local-destination  rerouting 

A  potentially  more  efficient  result  should  be  possible  with  a  more  sophisticated  algorithm, 
such  as  “local-destination”  rerouting  proposed  by  AT&T  (Anderson  et  al,  1994).  Consid¬ 
ering  a  span  failure,  failed  VPs  will  be  rerouted  with  one  of  the  span  terminating  nodes 
as  the  starting  point.  The  destination  of  the  alternate  route  will  depend  on  the  individual 
VP  route  however,  so  as  to  reduce  resource  consumption.  In  Figure  2,  if  span  3-4  fails, 
then  devising  a  shortest  hop  path  between  node  3  and  the  VP  terminating  node  8  will 
produce  the  two  hop  detour  3-7-8;  a  more  efficient  result  than  pure  local  rerouting.  The 
essence  of  the  algorithm  is  to  retain  as  large  a  portion  of  the  original  working  path  route 
as  possible,  then  find  the  most  direct  path  to  the  destination  of  the  failed  VP  whilst 
avoiding  the  failed  span.  From  this  very  basic  analysis  therefore,  a  set  of  heuristics  can 
be  devised  with  respect  to  an  individual  VP  affected  by  failure  of  a  span: 


1.  The  starting  point  of  the  detour  is  the  terminating  node  of  the  failed  span  at  the 
side  of  the  VP  with  most  hops:  if  equal,  select  at  random. 

2.  Find  the  shortest  hop  path  between  starting  point  of  detour  and  VP  destination 
node. 
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3.  Add  retained  part  of  path. 

4.  Remove  self-loops  and  discount  overlapping  resource  demands. 


original  VP  route 


(IkhW 


Figure  2:  Characteristics  of  local-destination  rerouting 

Step  4  is  included  because  backhauling  is  still  possible,  as  can  be  seen  in  Figure  3  by 
the  failure  of  span  2-3.  The  path  is  of  equal  length  on  each  side  of  the  failed  span  (1  hop). 
The  starting  point  of  the  detour  is  selected  as  node  3,  so  we  retain  hop  6-3  of  the  original 
VP  route,  and  seek  a  shortest  hop  path  between  node  3  and  the  VP  termination,  which 
is  node  1.  The  result  of  the  detour  is  thus  3-6-5-2-1,  and  when  we  concatenate  this  with 
the  retained  part  of  the  original  path,  we  obtain  6-3-6-5-2-1.  Obviously,  backhauling  has 
occurred  due  to  the  self-loop  6-3-6,  so  this  is  eliminated  leaving  6-5-2-1  as  the  new  VP 
route  employed  due  to  failure  of  span  2-3.  A  final  check  to  be  made  is  whether  or  not 
the  alternate  route  uses  any  of  the  original  VP  route  hops;  this  occurs  in  the  example 
with  respect  to  span  1-2,  hence  the  spare  capacity/VPI  requirements  for  this  hop  are 
discounted. 


original  VP  route 


G  d>— 0 


Figure  3:  Self-loop  and  overlap  during  local-destination  rerouting 


2.1.3  Source-based  rerouting 

Source-based  rerouting  ought  to  be  yet  more  efficient  in  terms  of  spare  capacity  (Anderson 
et  al,  1994),  by  allowing  alternate  routes  to  be  computed  between  the  terminating  nodes 
for  each  path  affected  by  a  failed  span  (Figure  4).  The  effect  of  this  algorithm  is  to  spread 
the  demand  for  spare  capacity  more  freely  throughout  the  network  than  the  preceding 
two  schemes  described.  Any  overlap  between  the  original  path  and  designated  alternate 
route  (eg  span  1-2  of  Figure  4)  is  dealt  with  by  discounting  the  spare  resource  demands 
for  such  spans. 
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Alternate  route  selected  for 
failure  of  span  3-4 


Figure  4:  Source-based  rerouting 


2.2  Failure  independent  approach 

In  the  failure  independent  case,  a  single  alternate  VP  route  can  be  designed  to  protect  a 
working  VP  from  any  single  span  failure.  The  design  criterion  to  satisfy  this  requirement 
is  that  a  span  disjoint  route  be  selected  for  protection.  Regardless  of  the  underpinning 
physical  span  which  induces  VP  failure  therefore,  the  same  protection  route  is  employed 
for  restoration.  From  Figure  5,  whether  span  1-2,  2-3  or  3-4  fails,  the  same  backup  route, 
1-5-6-7-8-4  protects  the  working  path  1-2-3-4.  Indeed,  failure  of  nodes  2  or  3  may  be 
circumvented  by  activating  this  same  route.  Because  there  is  a  single  alternate  protection 
route  for  a  working  VP,  the  backup  can  be  established  in  advance  of  failure  by  setting 
VPIs  at  the  appropriate  crossconnect  nodes;  from  Figure  5,  this  corresponds  to  nodes  5, 
6,  7  and  8.  Activation  of  such  a  VP  may  be  performed  by  altering  the  routing  table  at 
the  connection  endpoints  (i.e.  nodes  1  and  4  from  the  Figure).  It  is  at  such  nodes  that 
storage  of  alternate  routing  data  is  required. 


original  VP  route 


disjoint  backup  path 


Figure  5:  Failure  independent  (span  disjoint)  rerouting 

The  algorithm  may  be  written: 

For  each  VP 

Find  shortest  hop  span  disjoint  path 
End  For 

At  the  heart  of  this,  and  all  of  the  failure  dependent  strategies  described  previously,  is 
a  shortest  path  computation.  The  common  algorithm  employed  is  the  Floyd- Warshall 
technique,  based  on  distance  matrix  manipulation.  A  very  simple  modification  is  made  in 
that  given  the  choice  of  two  equidistant  path  routes,  a  random  selection  between  the  two 
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is  made.  It  should  be  reiterated  that  for  all  the  rerouting  schemes  investigated,  shortest 
hop  paths  are  found  based  on  link  weights  of  unity.  On  a  batch  provisioning  basis,  this 
produces  suboptimal  results  in  terms  of  spare  capacity.  If  minimisation  of  the  global  spare 
capacity  is  required,  a  more  complex  solution  to  the  rerouting  problem  would  be  required, 
such  as  mathematical  programming  or  stochastic  techniques  based  on  simulated  annealing 
(Coan  et  al,  1991).  The  techniques  employed  for  the  comparative  evaluation  detailed  in 
the  following  section  are  not  optimised  in  terms  of  spare  capacity  requirements  so  as  to 
ensure  that  none  of  the  other  metrics  which  are  quantified  become  negatively  biased. 


3  COMPARATIVE  EVALUATION 

3.1  Network  assumptions 

A  variety  of  network  models  will  be  used  to  generate  performance  data  for  each  of  the  four 
VP  rerouting  methods.  Some  pre-requisites  are  essential  to  simplify  the  analysis.  The 
networks  are  meshed  backbones  comprising  VP  crossconnects,  each  of  which  is  assumed 
to  be  collocated  with  a  VC  switch.  Measurements  of  required  resources  (spare  capacity 
and  VPIs)  and  path  lengths  correspond  to  the  inter  crossconnect  spans,  not  the  links 
between  VC  and  VP  switching  elements.  Each  span  between  crossconnect  nodes  will 
carry  just  one  bidirectional  fibre  transmission  system,  enabling  simple  computation  of 
redundant  resources.  In  each  network  considered,  a  single  bidirectional  VP  of  unit  capacity 
is  established  between  each  node  pair,  hence  in  an  n  node  network,  there  are  n(n  —  l)/2 
Virtual  Paths.  These  working  paths  will  be  generated  using  the  Floyd- Warshall  algorithm 
with  shortest  hop  routes  being  selected.  Alternate  routing  information  which  forms  the 
basis  of  the  design  metrics  for  comparison,  is  then  produced  for  each  of  the  four  schemes 
detailed  in  the  previous  section.  Full  protection  from  single  span  failures  is  provided  in 
each  case. 

Prior  to  the  analytic  detail  of  individual  metrics,  some  basic  nomenclature  is  intro¬ 
duced.  The  physical  network  is  described  as  a  graph  G(V,E),  whereby  V  is  the  set  of 
vertices  representative  of  ATM  nodes,  and  E  is  the  set  of  edges  representing  inter-nodal 
spans.  A  single  vertex  is  denoted  v  (v  £  V)  whilst  a  single  edge  is  symbolised  as  e  (e  6  E). 
The  working  capacity  of  an  edge  e  is  denoted  We,  whilst  the  spare  capacity  is  Sc.  The 
logical  network  is  described  by  the  set  of  paths  P,  whereby  a  single  path  7r  (7t  G  P)  is 
the  collection  of  edges  traversed.  The  capacity  of  a  path  w  is  Cn.  The  set  of  protection 
routes  is  defined  as  P,  with  a  protection  path  pertaining  to  a  working  path  7r  denoted  7 r 
in  the  failure  independent  case  and  7rf  in  the  failure  dependent  case,  with  the  superscript 
f  denoting  the  index  of  the  failed  edge,  ef.  Note  that  7rf  represents  the  end-to-end  route  of 
the  path  7r  following  restoration  from  failure  of  edge  ef,  part  of  which  is  often  unchanged. 
For  clarity,  we  further  define  to  be  the  edges  of  the  rerouted  part  of  the  path  only  (the 
subscript  d  refers  to  detour).  There  are  m  edges,  n  vertices  and  k  paths  in  the  network. 
Additional  notation  will  be  introduced  where  necessary. 
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3.2  Computation  of  metrics 

Given  the  working  VP  and  alternate  routing  information,  the  following  metrics  can  be 
computed  for  each  of  the  four  rerouting  schemes  applied  to  several  network  topologies. 


3.2.1  Spare  Capacity  Ratio  (SCR) 

Ultimately,  the  SCR  is  the  ratio  of  the  aggregate  spare  capacity  in  the  network  to  the 
aggregate  working  capacity.  The  working  capacity  of  an  edge  is  found  by  summing  the 
capacities  of  constituent  paths: 

w.=  £  cw.  (i) 

•*G-P,eE7r 


Hence,  the  total  working  capacity  is  found  by  summing  (1)  over  the  set  of  network  edges. 
Computing  the  individual  spare  capacity  quotas  per  edge  is  a  little  more  complex.  De¬ 
pending  on  the  edge  which  has  failed  in  the  network,  the  demanded  spare  capacity  on  the 
remaining  edges  differs.  This  is  because  a  different  set  of  working  paths  will  be  affected  by 
each  possible  failure,  hence  a  different  reconfiguration  is  performed  in  each  case.  Letting 
Si  symbolise  the  spare  capacity  required  on  edge  e  due  to  failure  of  edge  ef,  we  have: 

Si  =  £  c*.  (2) 

n£P,e{  £7r,e67T- 


For  the  failure  independent  case,  or: 


£  =  £  <7.. 

7r6P,ef 


(3) 


Which  applies  to  the  failure  dependent  case.  Now,  the  provisioning  of  spare  capacity 
on  each  edge  must  account  for  the  edge  failure  which  will  yield  the  greatest  demand  for 
rerouted  traffic.  We  thus  find  the  required  spare  capacity  for  an  edge  e,  denoted  Sc,  in 
the  following  way: 

Sc  =  max  {Si,  Si,...,  S?}.  m 


It  should  be  stressed  that  no  attempt  is  made  at  capacity  modularisation  so  as  to  conform 
to  specific  transmission  systems.  The  value  of  SCR  is  subsequently  found  by  dividing  the 
total  spare  capacity  by  the  total  working  capacity: 


SCR  = 


A 


(5) 
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3.2.2  Mean  VPI  Redundancy  ( MVR ) 

When  protection  routes  are  designed,  Virtual  Path  Identifiers  (VPIs)  must  be  reserved 
for  the  appropriate  links.  The  total  number  of  idle  VPIs  is  a  function  of  the  number  of 
edges  used  in  each  protection  route  since  VPI  translation  is  performed  for  each  link  of 
a  VP  connection  (ITU-T,  1993a).  For  the  failure  independent  case,  letting  L( 7r)  be  the 
length  (number  of  edges  used)  of  a  specific  protection  path,  ff,  the  total  number  of  idle 
VPIs,  denoted  Nv ,  is  found  from: 


nz  =  E  L( *)■  (6) 

*eP 

The  subscript  fi  indicates  failure  independent.  In  a  similar  fashion,  fd  will  specify  the 
failure  dependent  version  of  appropriate  metrics.  For  the  failure  dependent  case, 
is  the  number  of  spans  in  the  rerouted  part  of  the  end-to-end  working  path  7T,  activated 
due  to  failure  of  ef.  Considering  all  failures  per  path,  and  the  complete  set  of  paths  in 
the  network: 


^EE  £(*$)• 

7T6F  =f€ir 


(7) 


Now,  given  the  total  number  of  VPIs,  regardless  of  the  rerouting  scheme,  the  MVR  may 
be  found  by  dividing  the  appropriate  Ny  by  m  (the  number  of  edges)  giving  the  mean 
VPI  redundancy  per  edge;  this  quantity  can  then  be  normalised  to  the  maximum  number 
of  VPIs  per  link  (4096),  yielding: 


MVR  — 


Nv/m 
4096  ' 


(8) 


3.2.3  Path  Elongation  Factor  ( PEF ) 

This  is  simply  the  ratio  of  the  mean  length  of  a  VP  rerouted  during  failure  restoration  to 
the  mean  working  path  length.  For  the  failure  independent  scheme,  this  is  given  by  the 
equation: 


pefr  = 


E„6pW 


(9) 


For  the  failure  dependent  scheme,  we  first  find  the  mean  rerouted  path  length  for  an 
individual  path,  denoted  n^,  given  by: 


n_ 


Eefg^(7Tf) 

L(tt) 


(10) 
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Now,  averaging  over  all  paths  in  the  network  provides  the  mean  rerouted  path  length, 
which,  when  divided  by  the  mean  working  path  length  gives  the  PEF  for  the  failure 
dependent  rerouting  as: 


PEFid  = 


SireP  < 

£tt6P  £(*■)' 


(11) 


3.2.4  Mean  Memory  Requirements  ( MMR ) 

The  memory  requirements  for  pre-stored  data  are  quite  distinct  for  each  approach  to 
rerouting.  With  the  failure  dependent  technique,  VPIs  and  link  IDs  are  associated  with 
specific  failures.  The  information  is  stored  in  a  database  and  is  only  loaded  into  the  active 
VP  routing  tables  when  the  crossconnect  is  notified  of  the  failure.  Such  an  operation  is 
carried  out  at  all  the  participating  nodes  of  an  alternate  route  detour.  In  contrast  to  this, 
the  failure  independent  approach  involves  pre-loading  translation  tables  of  all  downstream 
nodes  with  the  VPIs  of  the  alternate  route;  at  such  nodes,  there  are  no  database  memory 
requirements  for  VPI/link  ID  information.  This  is  because  the  translation  table  itself 
contains  the  mapping  between  input  and  output  VPIs.  Since  such  tables  will  be  designed 
for  the  maximum  possible  number  of  VPs  passing  through  a  node,  there  is  effectively  no 
overhead.  It  need  only  be  at  the  VP  endpoints  that  alternate  VP  routing  data  be  stored  at 
a  database,  which  is  used  to  re-load  the  translation  tables  when  these  nodes  learn  that  the 
VP  has  failed  (Veitch  et  al,  1995a).  Some  simple  assumptions  will  now  be  made  to  enable 
an  approximate  enumeration  of  memory  requirements  for  the  two  rerouting  paradigms. 
For  the  failure  independent  method,  an  alternate  (VPI(out)/Link(out),VPI(in)/Link(in)) 
pairing  is  associated  with  a  bidirectional  VP  at  each  endpoint,  as  depicted  in  Figure  6(a). 


Working  VP 

Alternate  VP 

VP I (out)  Link (out) 

VPI (out)  Link (out) 

downstream 

Span  ID 

VPI(in)  Link(in) 

VPI (out)  Link(out) 

VP I (in)  Link (in) 

VPI(in)  Link (in) 

upst  ream 

(b) 

(a) 


Figure  6:  Storage  format  for  alternate  routing  information:  (a)  failure  independent  (b) 

failure  dependent 


This  covers  the  upstream  and  downstream'  parts  of  the  VP.  We  assume  that  each  entry 
takes  16  (2  x  8)  bytes  of  memory  for  storage.  If  one  bidirectional  span  disjoint  protection 
path  is  allocated  to  each  of  k  bidirectional  VPs  in  a  network,  the  total  memory  requirement 
is  simply  2  x  k  x  16  bytes.  Thus,  the  MMR  metric  which  gives  the  average  memory 
requirement  per  node  is  simply: 

,,.,D  32  ■  k 

M  M  Rfi  —  .  riot 
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For  failure  dependent  techniques,  the  input  and  output  VP  information  corresponding  to 
a  particular  span  failure  for  one  direction  of  a  VP,  is  shown  in  Figure  6(b).  It  is  assumed 
that  10  bytes  are  consumed  with  this  format.  If  L( d^)  span  hops  are  used  in  a  certain 
protection  detour,  information  storage  is  required  at  L(7Tj)  +  1  nodes.  Thus,  in  any  failure 
dependent  scheme,  the  memory  required  for  a  bidirectional  alternate  route  employed  when 
a  specific  span  fails  is: 

2  x  (L( 7T^)  -j- 1)  x  10  bytes. 


To  compute  the  total  memory  requirements  for  a  network,  the  above  quantity  will  be 
summed  over  all  possible  span  failures  related  to  all  bidirectional  VPs.  The  MMR  is 
then  found  by  dividing  by  n,  the  number  of  nodes,  to  give: 

MMR{i  =  2°-X  p  +  !))  (13) 

n 

3.2.5  Routing  Computational  Effort  (RCE) 

The  computation  required  to  produce  alternate  routes  is  non-trivial  since  working  VP 
configurations  may  be  subject  to  capacity  and/or  routing  re-allocation  (Sato  et  al,  1990)  at 
regular  intervals.  This  implies  that  protection  plans  have  to  be  revised  in  accordance  with 
the  new  VP  arrangement.  Fast  computation  is  thus  essential  to  minimise  the  probability 
that  a  failure  will  occur  between  the  time  of  the  working  VP  rearrangement  and  the 
assignment  of  new  protection  routes.  We  express  the  RCE  metric  in  the  simplest  possible 
way,  that  is  by  the  number  of  rerouting  computations  for  the  required  protection  condition, 
assumed  throughout  to  be  single  span  failures.  For  the  failure  independent  scheme,  since 
there  is  a  protection  path  for  each  of  the  k  working  paths,  we  have: 

RCEfi  =  k.  (14) 

For  the  failure  dependent  scheme,  alternate  routes  are  found  for  each  failed  path  of  every 
failed  span.  The  total  number  of  alternate  routes  required  can  hence  be  found  by  summing 
the  number  of  spans  used  in  each  path  to  obtain: 


RCEid  =  £ 

it  eP 


(15) 


3.3  Numerical  results 

A  computer  program  was  written  which  takes  any  network  topology  description  as  its 
input,  and  produces  the  above  metrics  as  its  output  by  realising  each  of  the  alternate 
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routing  strategies.  Shortest  path  routes  were  found  for  working  paths  with  a  random 
choice  between  equal  length  paths.  For  simplicity,  all  working  VPs  were  assumed  to  be  of 
unit  capacity.  The  SCR,  MVR,  PEF,  MMR  and  RCE  metrics  were  computed  for  four 
grid  networks  of  6,  9,  12  and  20  nodes.  Because  of  the  random  outcome  of  the  shortest 
path  algorithm,  the  mean  result  from  5  replicated  computations  was  derived.  Figures  7 
and  8  display  the  SCR  and  MVR  results,  respectively.  In  all  graphs,  plotted  points  are 
joined  up  for  visual  convenience. 


Figure  7:  Spare  Capacity  Ratio  [SC R)  versus  network  size 


Figure  8:  Mean  VPI  Redundancy  [MVR)  versus  network  size 
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In  Figure  7,  it  can  be  seen  that  the  failure  independent  i.e.  span  disjoint,  rerout¬ 
ing  scheme  produces  the  lowest  spare  capacity  ratio.  As  to  why  this  is  better  than  the 
source-based  rerouting,  it  could  be  argued  that  by  adopting  the  constraint  of  disjointness, 
backup  routes  are  forced  to  spread  the  demand  for  spare  capacity  around  the  network. 
In  the  source-based  rerouting  meanwhile,  the  protection  routes  may  often  re-use  original 
VP  links,  thus  concentrating  spare  capacity  requirements  on  links  closer  to  the  failure 
as  in  the  local-destination  approach.  Of  the  three  failure  dependent  approaches  mean¬ 
while,  the  source-based  rerouting  method  requires  the  least  spare  capacity  than  the  others 
which  is  thanks  to  the  greater  degree  of  freedom  in  route  selection.  The  local-destination 
rerouting  improves  the  efficiency  of  alternate  routing  design  over  local  rerouting  due  to 
the  elimination  of  backhauling. 

From  Figure  8,  the  VPI  redundancy  is  greater  for  the  failure  dependent  approaches,  and 
the  divergence  between  these  and  the  failure  independent  scheme  increases  with  network 
size.  The  main  reason  for  this  is  that  because  different  routes  are  allocated  to  individual 
failures  which  may  affect  a  given  VP,  many  more  links  are  potentially  involved  in  the 
rerouting  process.  Although  the  failure  independent  scheme  used  less  VPIs  than  all  of  the 
failure  dependent  methods  in  the  examples  considered,  this  need  not  always  be  the  case. 
One  of  the  reasons  that  less  than  100%  spare  capacity  is  needed  for  failure  protection 
in  a  mesh  network,  is  that  sharing  of  resources  between  possible  failures  (equation  (4)) 
is  exploited.  This  sharing  of  resources  between  disparate  failure  events  may  be  applied 
to  VPIs.  In  the  computations  so  far,  a  different  VPI  is  employed  for  each  span  of  every 
protection  route,  regardless  of  whether  or  not  they  correspond  to  different  failures.  As 
with  capacity  sharing  however,  the  same  VPI  may  be  re-used  across  different  failures.  This 
is  -feasible  in  the  failure  dependent  rerouting  schemes  since  VPIs  are  stored  in  databases, 
only  to  be  loaded  into  lookup  tables  upon  failure  notification.  The  prospect  of  “VPI 
sharing”  presents  an  advantage  of  failure  dependent  over  failure  independent  rerouting. 
This  is  because  it  is  not  feasible  to  have  VPIs  shared  amongst  protection  paths  defined 
by  active  VPI  entries  in  lookup  tables,  since  ambiguous  routing  would  accrue. 

We  revise  the  MV R  for  the  failure  dependent  case  by  defining  an  integer  l\  to  be  the 
number  of  VPIs  needed  on  edge  e  due  to  failure  of  edge  ef.  The  worst-case  quantity  of 
VPIs  required  on  an  edge  e,  is  thus: 

Je  =  max {£,/’,  ...,/em}  (16) 


Hence,  the  total  number  of  reserved  VPIs  will  be: 


* h  =  £ 

eeE 


(17) 


Which  can  be  used  in  equation  (8)  to  provide  the  MV R.  The  MV R  metric  was  subse¬ 
quently  recomputed  with  VPI  sharing  allowed  in  the  failure  dependent  schemes,  and  as 
shown  in  Figure  9,  the  result  is  a  lower  mean  redundancy  of  VPIs  than  the  span  disjoint 
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rerouting  technique.  Where  VPI  numbers  are  re-used  between  different  failures,  the  rel¬ 
ative  order  of  failure  dependent  schemes  in  terms  of  increasing  resource  demand  is  the 
same  as  that  for  the  SCR  metric. 


Figure  9:  Recomputed  MV R  with  VPI  sharing,  versus  network  size 


The  PEF  metric  is  shown  in  Figure  10. 


From  Figure  10,  the  minimal  path  elongation  effects  are  evident  with  source-based  rerout¬ 
ing,  improving  over  the  span  disjoint  rerouting  results.  The  local-destination  is  sizeably 
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better  than  local  rerouting  with  the  latter  demonstrating  greatest  sensitivity  to  path 
elongation,  mainly  due  to  backhauling  effects.  To  help  understand  why  source-based 
rerouting  should  outperform  span  disjoint  rerouting  when  both  techniques  reroute  from 
path  terminating  nodes,  consider  Figure  11. 


Working  VP 


(a) 


Working  VP 


Figure  11:  Potential  path  elongation  with  span  disjoint  rerouting 


In  part  (a)  of  the  Figure,  the  working  path  1-2-5-6  is  shown  to  be  protected  by  backup 
route  1-4- 7-8-9  with  the  failure  independent  rerouting  scheme.  There  is  thus  a  difference 
of  2  hops  between  working  and  protection  routes.  Referring  to  part  (b)  of  the  diagram, 
with  the  source-based  rerouting  version  of  failure  dependent  protection,  route  1-4-5-6 
could  be  selected  for  the  failure  of  spans  1-2  or  2-5.  For  the  failure  of  span  5-6  meanwhile, 
the  new  route  could  be  1-2-3-6.  In  all  such  cases,  the  working  and  protection  routes  are 
the  same  length,  i.e.  there  is  no  elongation.  The  inferior  PEF  of  failure  independent 
rerouting  is  thus  due  to  the  disjoint  criterion.  It  should  be  pointed  out  however,  that 
with  a  more  careful  selection  of  working  path  route  between  the  same  nodes  in  Figure  11, 
eg  1-2-3-6,  a  span  disjoint  backup  path  1-4-5-6  could  be  allocated  yielding  no  elongation. 
This  demonstrates  the  inherent  dependence  of  protection  routing  design  on  the  particular 
layout  of  working  path  routes,  a  point  noted  by  Coan  et  al  who  suggested  joint  optimisa¬ 
tion  of  working  and  protection  layouts  to  achieve  a  truly  global  optimal  design  (Coan  et 
al,  1991)  . 

The  remaining  metrics,  MM R  and  RCE,  are  shown  in  Figures  12  and  13,  respectively. 
The  estimate  of  database  storage  required  per  node  shown  in  Figure  12,  clearly  indicates 
the  deficit  between  failure  dependent  and  failure  independent  rerouting  strategies.  Indeed, 
the  deficit  enlarges  with  the  scale  of  the  network,  whereby  source-based  rerouting  proves 
to  be  increasingly  sensitive.  Of  course,  it  may  be  argued  that  a  few  kilobytes  of  memory 
is  unimportant,  however  the  estimates  could  be  misleading.  The  reason  for  this  is  that  a 
mean  demand  per  node  was  computed,  which  is  fairly  artificial  as  some  maximum  value 
would  be  used  in  practice.  Also,  the  storage  would  have  to  accommodate  future  physical 
and  logical  growth,  since  the  assumption  of  a  single  VP  between  each  node  pair  will  often 
be  unrealistic. 

The  computational  effort  needed  to  produce  rerouting  information  is  shown  in  Figure 
13  with  no  distinction  between  the  failure  dependent  schemes  since  only  the  number 
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of  required  alternate  routes  was  evaluated.  If  desired,  suitable  weighting  of  each  scheme 
could  allow  individual  curves  to  be  fashioned,  although  this  is  not  considered  in  this  paper. 
The  curve  is  striking  as  it  highlights  the  sizeable  computational  overhead  associated  with 
failure  dependent  rerouting  in  contrast  with  its  failure  independent  counterpart. 


Figure  12:  Mean  Memory  Requirements  ( MMR )  versus  network  size 


Figure  13:  Routing  Computational  Effort  ( RCE )  versus  network  size 
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4  DISCUSSION 

4.1  Signalling  protocol  complexity 

The  preceding  section  presented  a  comparison  of  four  pre-planned  restoration  schemes  in¬ 
corporating  distinct  rerouting  policies,  three  of  which  are  special  cases  of  failure  dependent 
rerouting,  whilst  the  fourth  constitutes  the  failure  independent  policy.  An  important  facet 
of  restoration  which  has  not  been  discussed  as  yet,  is  the  signalling  protocol  employed  to 
activate  pre-assigned  routes.  First,  the  speed  with  which  restoration  can  be  accomplished 
is  paramount,  since  it  governs  the  extent  to  which  services  will  be  adversely  affected  by 
the  failure.  In  the  AT&T  paper  (Anderson  et  al,  1994),  no  computer  simulations  of  the 
signalling  protocol  to  execute  failure  dependent  restoration  are  described.  Rather,  an  es¬ 
timation  of  the  restoration  completion  time  for  a  40  node  network  with  modest  processing 
time  assumptions  is  cited  as  58  msec.  In  (Veitch  et  al,  1995a),  simulations  of  distributed 
protocols  to  realise  failure  independent  backup  path  restoration,  suggest  that  completion 
times  of  tens  of  milliseconds  are  possible.  It  can  hence  be  postulated  that  comparable 
restoration  completion  times  accrue  with  both  methods  of  VP  rerouting.  Of  additional 
concern  is  the  ease  with  which  protocols  can  be  implemented.  In  the  failure  independent 
scheme,  bidirectional  F4  Operations,  Administration  and  Maintenance  (OAM)  flows  can 
be  used  to  convey  alarm  and  confirmation  signals  between  the  endpoints  of  the  failed 
VP  and  the  protection  VP,  respectively.  This  is  a  significant  advantage  given  that  certain 
OAM  cells  are  already  standardised  (ITU-T,  1993b).  With  the  failure  dependent  schemes, 
inter-nodal  signalling  channels,  the  properties  of  which  have  yet  to  be  elucidated,  must 
be  employed  to  broadcast  failure  notification  signals. 


4.2  Planning  adaptability  and  protocol  robustness 

The  final  issue  to  be  considered  as  a  basis  for  comparing  the  failure  independent  and 
failure  dependent  schemes,  with  these  latter  grouped  as  a  whole,  is  that  of  robustness. 
First,  we  could  analyse  failure  adaptability  and  question  how  each  restoration  scheme,  in 
planning  and  execution,  handles  multiple  span  or  node  failures.  With  failure  independent 
rerouting,  transit  node  failures  can  be  protected  by  allocating  node  disjoint  protection 
paths  with  a  suitable  spare  capacity  allocation  to  match  (Kawamura  et  al,  1994).  No 
change  to  the  signalling  protocol  is  necessary  with  protection  route  activation  performed 
in  the  same  way  as  that  for  span  failures,  and  no  additional  storage  overheads  are  in¬ 
curred.  Unavailability  of  a  protection  route  due  to  a  multiple  failure  is  easily  identified 
with  explicit  confirmation  of  backup  paths  orchestrated  from  the  endpoints  (Veitch  et  al, 
1995a).  This  could  lead  to  a  dynamic  route  searching  protocol  being  invoked,  or  direct 
notification  of  the  problem  to  a  central  controller.  Planning  for  multiple  span  or  node 
failures  with  failure  dependent  rerouting  significantly  impacts  on  the  complexity  of  the 
whole  approach.  First,  concerning  the  required  planning  effort,  storage  overheads  and 
routing  computation  would  increase  sizeably  due  to  the  association  of  alternate  routes 
with  specific  failures.  This  intractability  would  become  accentuated  with  larger  network 
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topologies.  Secondly,  the  signalling  protocol  would  have  to  be  modified  so  that  nodes 
which  receive  broadcast  messages  glean  an  unambiguous  picture  of  the  current  physical 
network  topology. 

The  last  matter  of  uncertainty  which  puts  the  alternative  schemes  to  the  test  is  the 
prospective  lack  of  spare  capacity  in  the  network  with  which  to  support  rerouted  traffic. 
Although  planning  is  performed  in  conjunction  with  spare  capacity  placement,  or  indeed 
in  adherence  with  spare  capacity  constraints  (Veitch  et  al,  1995c),  occasions  can  arise 
where  the  supply  will  not  meet  the  demand.  If  protection  routes  are  activated  under  such 
circumstances,  the  quality  of  service  of  existing  connections  which  share  common  buffer 
and  transmission  resources,  and  are  unaffected  by  failure  in  the  first  place,  could  be  unac¬ 
ceptably  degraded.  Because  failure  independent  rerouting  involves  explicit  confirmation 
of  protection  path  capacity  availability  (Kawamura  et  al,  1994,  Veitch  et  al,  1995a),  if 
a  path  cannot  be  supported,  the  situation  is  quickly  recognised  and  appropriate  action 
taken.  The  problem  with  the  failure  dependent  approach  is  that  there  is  no  notion  of 
“capacity  capturing”  during  crossconnect  table  activation,  which  is  executed  for  a  bun¬ 
dle  of  rerouted  paths  at  any  one  time.  This  places  a  question  mark  over  the  supposed 
robustness  of  failure  dependent  rerouting. 


5  SUMMARY  AND  CONCLUSIONS 

This  paper  has  highlighted  the  fundamental  differences  between  two  pre-planned  VP 
restoration  paradigms,  the  failure  dependent  and  the  failure  independent  methods.  The 
choice  of  strategy  influences  implementation  costs  in  terms  of  spare  capacity,  reserved 
VPIs,  computational  overheads  and  memory  for  rerouting  information  storage.  Further¬ 
more,  the  anticipated  path  elongation  which  impacts  on  the  delay  performance  experi¬ 
enced  by  rerouted  connections,  must  be  accounted  for.  Metrics  corresponding  to  all  these 
factors  were  formulated,  then,  for  a  variety  of  grid  network  models,  a  comparative  evalu¬ 
ation  was  carried  out  between  the  failure  independent  span  disjoint  rerouting  scheme  and 
three  distinct  failure  dependent  rerouting  policies. 

The  span  disjoint  scheme  required  the  least  spare  capacity  for  all  networks  considered, 
with  the  source-based  rerouting  version  of  failure  dependent  restoration  a  close  second. 
The  important  point  to  note  is  that  these  results  were  not  optimised,  rather,  a  shortest 
hop  routing  algorithm  was  used  throughout  for  comparative  purposes.  If  optimisation  was 
performed  with  the  minimisation  of  a  cost  function  based  on  spare  capacity,  an  intuitive 
argument  would  suggest  that  the  source-based  failure  dependent  rerouting  would  require 
less  spare  capacity  than  the  failure  independent  case.  This  is  due  to  the  tailoring  of  alter¬ 
nate  routes  to  the  actual  failure,  something  which  failure  independent  rerouting  does  not 
cater  for.  The  other  two  failure  dependent  schemes,  local-destination  and  local  rerouting, 
displayed  greater  demand  for  spare  capacity,  with  the  latter  being  the  “greediest”,  due  to 
the  frequent  occurrence  of  backhauling  meaning  the  same  span  is  re-used  in  a  route.  Re¬ 
garding  VPI  redundancy  for  protection  routing,  the  outcome  depends  on  whether  or  not 
VPI  sharing  is  administered  in  the  instances  of  failure  dependent  restoration.  Without 
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VPI  sharing,  the  failure  independent  scheme  requires  less  idle  VPIs  than  all  the  failure 
dependent  methods,  otherwise,  it  is  the  failure  independent  method  that  incurs  the  great¬ 
est  redundancy.  The  degree  of  path  elongation  is  minimised  with  source-based  rerouting, 
whilst  the  span  disjoint  scheme  improves  over  the  other  two  failure  dependent  policies. 
The  reason  for  the  span  disjoint  scheme’s  inferiority  to  the  source-based  method  in  terms 
of  path  elongation,  is  that  certain  choices  of  working  path  routes  forces  the  disjoint  backup 
path  to  use  a  greater  number  of  spans  than  is  theoretically  necessary.  As  expected,  local 
rerouting  was  the  most  sensitive  to  path  elongation  effects,  again  due  to  backhauling. 
In  terms  of  storage  overheads  and  routing  computational  effort  meanwhile,  failure  in¬ 
dependent  rerouting  exhibits  a  clear  advantage  over  all  failure  dependent  schemes,  with 
significantly  less  memory  required  and  a  computational  effort  which  is  proportional  to  the 
number  of  paths  in  the  network  only. 

It  is  evident  that  in  terms  of  required  spare  capacity,  the  number  of  spare  VPIs,  and 
the  anticipated  elongation  of  paths,  the  two  most  attractive  solutions  to  pre-planned  VP 
restoration  appear  to  be  the  failure  independent  scheme  and  the  source-based  rerouting 
version  of  failure  dependent  protection.  This  latter  should  accomplish  the  lowest  spare 
capacity  provisioning  if  optimisation  is  performed,  and  furthermore,  a  smaller  number  of 
VPI  numbers  are  idled.  Also,  for  the  network  models  considered,  the  path  elongation 
was  minimised  with  source-based  rerouting.  Regarding  VPI  redundancy,  Kawamura  pos¬ 
tulated  that  the  ratio  of  working  to  backup  VPs  in  any  link  does  not  cause  concern  for 
VPI  availability  where  disjoint  backup  paths  are  assigned  (Kawamura  et  al,  1994).  Al¬ 
though  span  disjoint  rerouting  demonstrated  greater  sensitivity  to  path  elongation,  this 
could  be  remedied  by  exercising  a  joint  working/protection  VP  layout  which  minimises 
elongation  effects.  On  the  foundation  of  these  observations  therefore,  it  may  be  argued 
that  the  principal  advantage  of  source-based  rerouting  is  the  prospect  of  spare  capacity 
minimisation,  though  the  computational  effort  needed  to  attain  this,  and  how  much  gain 
over  the  failure  independent  scheme  would  accrue,  remains  open  for  investigation. 

The  potential  advantage  of  source-based  rerouting  is  offset  by  the  distinct  disadvantage 
of  much  greater  storage  overheads  needed  to  support  alternate  routing  plans.  In  addi¬ 
tion,  the  routing  computation  will  be  far  more  intense  for  all  failure  dependent  techniques 
compared  with  the  failure  independent  approach.  This  combination  of  factors  tends  to 
swing  in  favour  of  the  failure  independent  protection  routing  paradigm.  This  preference 
is  consolidated  by  analysis  of  the  qualitative  issues  related  to  protocol  complexity  and 
robustness.  It  was  discussed  in  the  penultimate  section  of  the  paper  how  backup  path 
activation  could  be  executed  with  simple  OAM  cell  transmission  protocols.  These  same 
protocols  could  be  used  whether  a  span  or  nodes  fail.  Indeed,  to  plan  for  node  failures, 
node  disjoint  backup  routes  can  be  allocated,  with  no  additional  storage  overheads  in¬ 
curred  to  support  this  mode  of  failure  recovery.  Furthermore,  the  confirmation  of  backup 
path  availability  allows  detection  of  multiple  failure  or  limited  spare  capacity  conditions. 
All  of  these  features  of  failure  independent  rerouting  are  in  sharp  contrast  to  the  failure 
dependent  approach  which  requires  significant  extra  computation  and  storage  space  to 
accommodate  other  failures  besides  single  span.  The  restoration  protocol  itself  is  compli¬ 
cated  by  unanticipated  failures,  and  if  there  is  limited  spare  capacity  to  support  rerouted 
paths,  there  is  no  specified  distributed  mechanism  to  recognise  the  syndrome. 
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To  conclude,  the  failure  independent  rerouting  scheme  for  pre-planned  Virtual  Path 
restoration  incorporates  properties  of  resource  efficiency,  low  implementation  complexity 
and  robustness,  which  combine  to  make  it  a  suitable  foundation  for  planning  survivable 
ATM  networks. 
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Abstract 

Virtual  Path  Bandwidth  (VPB)  control  and  Virtual  Circuit  Routing  (VCR)  control  are 
competitive  control  schemes  for  traffic  management  in  ATM  networks.  The  objective  of  both 
controls  is  to  minimize  the  Call  Blocking  Probability  (CBP)  of  the  congested  end-to-end  links, 
under  constraints  posed  by  the  transmission  links  capacity  of  the  network.  Firstly,  we  compare 
the  performance  of  two  VCR  control  schemes,  the  DAR  and  DCR,  well-known  in  the 
environment  of  STM  networks,  considering  several  trunk  reservation  parameters  and  different 
control  intervals.  Secondly,  we  compare  the  performance  of  VPB  control  schemes  with  that  of 
VCR  control  schemes,  both  under  static  and  dynamic  traffic  conditions.  Under  static  traffic 
conditions  the  efficiency  of  the  two  control  schemes  in  minimizing  the  worst  CBP  of  the 
network  is  examined,  whereas  under  dynamic  traffic  conditions  their  response  time  is  measured 
by  means  of  simulation.  In  short,  VPB  control  is  more  effective  than  VCR  control  when  the 
traffic  fluctuation  is  large  while  VCR  control  has  a  faster  response  time  than  VPB  control. 

Keywords 

Virtual  Path  Bandwidth  Control,  Dynamic  Routing  Control,  ATM  networks. 
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1  INTRODUCTION 

In  ATM  networks,  network/traffic  management  has  a  layered  structure  of  two  levels,  the  Call- 
level  and  the  Cell-level,  which  correspond  to  the  distinction  of  traffic  in  call  and  cell 
components,  respectively.  We  concentrate  on  the  Call-level  traffic  management  and  especially 
on  controls  which  drastically  influence  the  global  performance  of  an  ATM  network  under 
constraints  posed  by  the  bandwidth  capacity  of  transmission  links.  Virtual  Path  Bandwidth 
(VPB)  control  and  Virtual  Circuit  Routing  (VCR)  control  are  the  main  controls  strongly  related 
to  the  transmission  links  capacity.  Their  performance  is  evaluated  by  the  Call  Blocking 
Probability  (CBP).  Bandwidth  and  trunk  reservation  controls  are  also  related  to  the  transmission 
links  capacity  and  closely  cooperate  either  with  VPB  or  VCR  control. 

In  this  paper,  we  compare  the  performance  of  VCR  control  schemes,  also  called  Dynamic 
Routing  (DR)  (Mase,  1989),  with  the  performance  of  VPB  control  schemes  (Logothetis  1992, 
Shioda  1994),  in  the  environment  of  ATM  networks. 

The  VCR  control  objective  is  to  provide  an  alternate  route  for  each  Virtual  Circuit  Connection 
(VCC)  that  fails  to  be  established  on  the  first  choice  (direct)  Virtual  Path  Connection  (VPC), 
exploiting  the  spare  capacity  of  the  network.  The  VPB  control  objective  is  to  rearrange  the 
installed  bandwidth  of  the  VPs  according  to  the  offered  traffic  fluctuation  so  as  to  minimize  the 
worst  (maximum)  CBP  of  all  end-to-end  links. 

Several  DR  control  schemes  have  been  proposed  for  use  in  the  traditional  telephone  networks: 

a)  Dynamic  Non-Hierarchical  Routing  (DNHR),  a  time-dependent  routing  scheme  developed 
by  AT&T  (Ash,  1990), 

b)  Trunk  Status  Map  Routing  (TSMR)  (an  extension  of  DNHR)  that  modifies  the  routing 
patterns  calculated  by  DNHR  considering  the  trunk  status  (Ash,  1985), 

c)  Dynamic  Alternative  Routing  (DAR),  a  decentralized  state-dependent  routing  developed  by 
British  Telecom  (Stacey  1987,  Key  1990), 

d)  Dynamically  Controlled  Routing  (DCR),  a  centralized  version  of  the  state-dependent 
dynamic  routing  (Rengier  1983,  Cameron  1983), 

e)  State  and  Time-dependent  Routing  (STR),  a  hybrid  routing  scheme  that  combines  the  time- 
dependent  control  at  the  routing  pattern  definition  and  state-dependent  control  at  the  VC-level 
routing  definition,  proposed  by  NTT  (Mase,  1 990). 

We  have  chosen  two  of  the  above  DR  control  schemes  to  be  considered  as  VCR  control 
schemes  in  ATM  networks:  the  decentralized  control  scheme  DAR  and  the  centralized  control 
scheme  DCR.  Before  comparing  their  performance  with  that  of  VPB  control  schemes,  their 
performance  in  minimizing  the  worst  CBP  is  comparatively  examined,  when  they  cooperate 
with  several  Trunk  Reservation  control  schemes,  or  when  different  control  intervals  are 
considered. 

The  performance  of  the  VCR  and  VPB  control  schemes  is  examined  under  static  and  dynamic 
traffic  conditions  on  a  test-bed  ATM-network  of  1 0  nodes,  in  a  ring  topology,  accommodating 
two  service-classes.  Under  static  traffic  conditions  we  examine  the  performance  of  VPB  and 
VCR  controls  in  minimizing  the  worst  CBP  of  the  whole  network.  The  applied  VPB  control  is 
optimal  and  is  obtained  analytically,  through  a  global  network  optimization  model.  The  results 
of  the  application  of  the  VCR  control  schemes  are  obtained  through  simulation.  Under  dynamic 
traffic  conditions,  we  examine  the  response  time  of  the  above  control  schemes.  For  the 
application  of  VPB  control  we  consider  the  Medium-Term  VPB  control  scheme,  described  in 
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reference  (Logothetis,  Shioda,  1995),  with  a  control  interval  long  enough,  because  the  required 
time  for  bandwidth  rearrangement  is  considerably  long,  due  to  the  existing  call  connections  at 
that  time-point.  As  far  as  the  incorporated  bandwidth  and  trunk  reservation  control  schemes  are 
concerned,  the  bandwidth  reservation  scheme  which  equalizes  the  CBP  of  the  two  service- 
classes  is  considered  for  the  VPB  control,  while  several  trunk  reservation  schemes  are 
considered  for  the  VCR  control  schemes.  Concerning  the  dynamic  traffic  condition,  we 
consider  that  traffic  fluctuates  according  to  a  step  function  (theoretical  case),  applied  on  one 
switching  pair,  in  one  traffic-flow  direction  only. 

This  paper  is  organized  as  follows:  In  Section  2  an  ATM  network  architecture  is  described 
which  is  appropriate  for  the  applicability  of  VPB  and  VCR  control  schemes.  In  Section  3,  the 
objective  and  the  VPB  control  schemes  are  presented.  Section  4  includes  three  subsections.  In 
subsection  4.1  and  4.2,  the  VCR  control  schemes,  DAR  and  DCR,  respectively,  are  described 
and  the  calculation  of  the  involved  CBP  in  the  VPs  of  an  ATM  network  is  given.  In  subsection 
4.3  the  two  VCR  control  schemes  are  comparatively  examined.  Firstly,  they  are  compared  in 
respect  to  the  resultant  average  CBP  of  the  network,  under  static  traffic  condition  and  in 
cooperation  with  several  trunk  reservation  control  schemes.  Secondly,  the  same  comparison  is 
carried  out  when  the  best  trunk  reservation  control  scheme  is  considered  for  cooperation 
(obtained  from  the  first  comparison)  and  the  control  (update)  interval  of  the  DCR  control 
scheme  varies.  In  Section  5,  the  VPB  and  VCR  control  schemes  are  comparatively  examined, 
under  static  (subsection  5.1)  and  dynamic  (subsection  5.2)  traffic  conditions.  As  a  conclusion, 
we  summarize  the  results  of  this  paper  in  section  6. 


2  ATM  NETWORK  ARCHITECTURE 

An  ATM  network  architecture  is  considered  in  which  each  ATM  switch  (ATM-SW)  is 
accompanied  by  an  ATM  Cross-Connect  (ATM-XC)  system.  The  ATM-XCs  are 
interconnected  by  a  ring  transmission  line  and  compose  the  backbone  network  (Figure  la).  This 
architecture  has  the  advantage  of  simplicity  and  offers  higher  transmission  line  utilization  (Sato, 
1990).  The  transmission  links  are  assumed  bi-directional.  A  connection  between  two 
ATM-SWs  is  established  via  any  available  path  that  has  been  registered  in  a  table,  called 
Routing  Table  (RT).  Under  the  consideration  of  this  paper  the  route  of  a  path  between  two 
ATM-SWs  passes  through  ATM-XCs  only. 

Other  network  topologies  could  be  also  considered.  In  the  topology  of  the  backbone  network 
of  Figure  la,  two  parts  can  be  distinguished  to  make  our  study  easier:  one  composed  of  the 
ATM-XCs,  called  outer  network  and  another  composed  of  the  interconnected  ATM-XCs,  called 
inner  network. 

Thanks  to  the  Virtual  Path  (VP)  concept,  the  traffic  management  by  reallocating  the 
established  bandwidth  of  the  paths  (VPB  management)  according  to  the  traffic  variations 
becomes  favorable  in  ATM  networks.  The  concept  of  VP,  whereby  two  ATM-SWs  face  only 
the  direct  logical  (imaginary)  link  (VP)  between  them,  makes  the  structure  of  the  backbone 
network  transparent  to  the  ATM-SW  pairs.  This  is  due  to  flexibility  of  the  ATM-XCs  to 
provide  the  required  bandwidth  in  the  end-to-end  links  (VP  connections)  of  the  ATM-SWs. 
Therefore,  from  the  VPB  management  point  of  view,  the  whole  ATM  network  is  equivalent  to  a 
meshed  network  in  which  only  the  direct  links  are  used  (Figure  lb).  These  links  represent  the 
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VPCs. 

Since  we  assume  the  equivalent  mesh  network  architecture,  where  the  ATM-SWs  are  fully 
interconnected  with  VPCs,  the  first  choice  route  for  a  VC  to  establish  a  VCC,  is  its  direct  VPC. 
When  the  VCC  is  blocked  at  the  first  choice  route,  an  alternate  route  will  be  attempted 
(according  to  the  applied  VCR  control  scheme)  which  consists  of  two  VPCs.  This  routing 
scheme  meets  the  basic  requirements  for  the  application  of  the  well-known  DR  control  schemes 
of  the  STM  networks  (Yokoi,  1995). 

The  VCR  controller  can  be  either  a  decentralized  controller,  like  the  DAR,  or  a  centralized 
one,  like  the  DCR.  In  the  case  of  a  decentralized  control  scheme,  in  each  ATM-SW  there  is 
one  VCR  controller  who  is  informed  about  the  traffic-flow  condition  in  the  VPCs  of  the 
network,  by  counting  the  number  of  VCC 


Figure  la  ATM  network  architecture. 


Figure  lb  Equivalent  meshed  VPC  network. 

failures,  in  order  to  define  the  route  for  the  next  call  arrival  (next  VCC).  On  the  other  hand,  a 
centralized  VCR  controller  is  located  at  a  network  management  centre  and  determines 
alternative  VPCs  to  realize  a  VCC,  for  each  ATM-SW  pair  of  the  network.  This  is  done  by 
receiving  every  few  seconds  the  traffic  conditions  of  the  VPCs  from  each  ATM-SW  and 
exploiting  the  idle  capacity  of  the  VPCs. 
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The  VPB  controller  is  located  at  an  administrative  centre  (centralized  controller).  It 
communicates  with  the  ATM-SWs  to  collect  the  measurements  of  carried  traffic  and  blocking 
during  each  control  interval.  Based  on  these  measurements,  it  calculates  the  offered  traffic. 
From  the  offered  traffic,  the  installed  bandwidth  in  the  transmission  links  and  the  VPs  listed  in 
the  RT,  the  VPB  controller  determines  the  allocation  of  the  bandwidth  to  the  VPs,  by  solving  a 
large  network  optimization  model.  Then,  it  updates  the  data  relevant  to  the  VP  bandwidth  in  the 
ATM-SWs.  The  realization  of  the  produced  VPB  allocation  is  executed  by  the  ATM-SWs 
simultaneously,  after  a  delay  due  to  the  existing  call-connections  at  the  time  point  of  bandwidth 
rearrangement.  The  ATM-SWs  increase  or  decrease  the  number  of  cells  which  have  a  specific 
Virtual  Path  Identifier  (Saito,  1991)  when  the  bandwidth  of  this  VP  is  increased  or  decreased, 
accordingly.  It  is  worth  mentioning  that  no  communication  between  the  VPB  controller  and  the 
ATM-XCs  is  required. 


3  VPB  CONTROL 

Telecommunication  networks  are  designed  to  convey  the  traffic  of  all  switching  pairs  so  as  to 
meet  a  pre-described  QOS.  Due  to  traffic  variations  from  hour  to  hour  the  traffic  load  on  some 
switching  pairs  is  below  the  forecasted  value  and  free  bandwidth  results.  On  the  other  hand, 
overloads  occurring  at  the  same  time  on  other  switching  pairs  cannot  use  the  free  bandwidth  of 
the  network,  if  it  is  not  possible  to  transfer  the  surplus  bandwidth  towards  the  congested 
switching  pairs.  This  is  the  work  of  VPB  control.  It  reallocates  the  bandwidth  of  the  VPs 
according  to  the  offered  traffic  so  as  to  improve  the  global  performance  of  the  network,  under 
constraints  posed  by  the  transmission  links  capacities.  The  resultant  distribution  of  the  totally 
installed  bandwidth  to  the  VPs  is  the  VPB  allocation. 

To  rearrange  the  VP  bandwidth  dynamically,  the  following  types  of  VPB  control  schemes 
have  been  proposed: 

a)  Very-Short-Term  control  schemes  based  on  the  information  of  the  concurrent 
connections  in  the  VPs  (Ohta,  1988),  with  control  interval  less  than  5  min. 

b)  Short-Term  control  schemes  based  on  the  blocking  measurements  taken  during  the 
control  interval  which  ranges  from  several  minutes  to  a  few  hours  (Shioda.  1991). 

c)  Long-Term  control  schemes  based  on  traffic  prediction  with  control  interval  ranging 
from  a  few  hours  to  a  few  days  (Monteiro,  1990). 

d)  Medium-Term  VPB  control  based  on  traffic  measurements,  with  control  interval  ranging 
from  several  minutes  to  a  few  hours  (Logothetis,  Shioda,  1995). 

The  Very-Short-Term  and  the  Short-Term  control  must  be  distributed  control  schemes  in 
order  to  respond  quickly  to  sharp  traffic  fluctuations  and  absorb  them.  To  achieve  this,  they 
need  very  simple  computations.  They  can  ignore  the  traffic  characteristics  of  service-classes 
(Ohta,  1988),  which  is  an  important  advantage  in  the  B-ISDN  environment.  The 
Very-Short-Term  control  achieves  an  optimal  network  performance.  The  implementation, 
however,  of  this  control  scheme  is  very  difficult  and,  therefore,  it  is  only  of  theoretical  value.  A 
large  number  of  control  steps  is  needed,  especially  when  the  traffic  volume  is  large.  The 
Short-Term  control  schemes  are  readily  implemented  but  they  lack  optimality. 

On  the  other  hand,  the  Long-Term  control  is  a  centralized  control  where  the  controller  aims  at 
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an  optimal  network  performance  in  the  control  interval  by  solving  a  large  network  optimization 
problem.  However,  the  controller  is  based  on  the  prediction  of  the  offered  traffic  which  is  a  time 
consuming  task,  though  it  is  not  possible  to  be  accurate.  Therefore,  the  importance  of  the 
achieved  optimality  is  weakened.  The  main  advantage  of  the  Long-Term  control  schemes  is  that 
they  can  easily  be  implemented,  because  VP  bandwidth  is  rearranged  only  a  few  times  per  day. 

The  Medium-Term  VPB  control  scheme  reconciles  the  advantages  and  disadvantages  of  the 
Short-Term  and  Long-Term  control  schemes.  The  controller  must  be  a  centralized  one  in  order 
to  optimize  the  network  performance  globally  within  its  control  interval.  The  control  interval 
must  be  rather  short  in  order  to  respond  satisfactorily  to  medium-term  traffic  fluctuations. 
Short-term  traffic  fluctuations  could  be  absorbed  by  the  implementation  of  VCR  control  in  a 
further  stage.  To  achieve  this  Medium-Term  VPB  control,  the  controller  formulates  a  global 
network  optimization  model  which  is  driven  from  the  offered  traffic,  determined  from  on-line 
measurements  of  the  carried  traffic  and  the  CBP  of  each  service-class  of  the  network.  The 
optimization  criterion  is  to  minimize  the  worst  CBP  of  all  VPCs  (Logothetis  1993,  Logothetis 
1995). 

4  VCR  CONTROL 

VCR  control  is  an  alternate  dynamic  routing  method  that  updates  the  set  of  possible  alternate 
VPCs  for  each  ATM-SW  pair  based  on  the  state  of  the  network  (state-dependent),  or  according 
to  preplanned  routing  patterns  calculated  so  as  to  meet  the  forecasted  traffic  demand  for  each 
time  period  of  the  day  (time-dependent).  Benefits  of  the  dynamic  alternate  routing  in 
comparison  to  the  fixed  alternate  routing  are:  the  higher  utilization  of  network  resources  (and 
hence  cost  savings)  and  the  tolerance  against  network  failures  and  traffic  fluctuations. 

In  this  paper,  two  conventional  dynamic  routing  control  schemes,  the  Dynamic  Alternative 
Routing  and  the  Dynamically  Controlled  Routing,  are  examined  in  their  applicability  to  ATM 
networks. 


Figure  2  Flow  diagram  for  DAR. 
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4.1  Dynamic  Alternative  Routing  (DAR) 

DAR  is  an  example  of  a  decentralized  routing  control  scheme.  According  to  this  algorithm,  a 
VCC  that  fails  on  the  first  choice  VPC  (direct)  is  offered  to  the  current-choice  alternate  route 
(composed  by  two  VPCs)  and  if  it  is  blocked,  a  new  current-choice  is  selected  at  random  from 
all  possible  alternate  routes,  to  be  used  for  the  next  VCC  attempt  (Figure  2). 

Performance  evaluation  of  an  A  TM  network  controlled  by  DAR 

To  evaluate  the  performance  of  an  ATM  network  controlled  by  DAR,  we  determine  the  CBP  of 
the  VPCs.  For  the  long-run  stationary  behavior  of  the  network,  we  extend  the  methodology 
found  in  references  (Gibbens  1989,  Key  1989,  Mitra  1991)  to  the  ATM  environment, 
considering  that  each  VP  is  commonly  shared  by  two  service-classes  (ck)  with  bck  (k=l,2) 
required  bandwidth  per  call. 

The  following  notations  are  used: 

Vs:  Bandwidth  assigned  to  the  VPC  s. 

r(l):  First-choice  VPC  used  by  the  switching  pair  r. 

r(2):  Alternate  route  of  two  VPCs  used  by  the  switching  pair  r. 

R^:  Set  of  all  possible  alternate  routes  for  the  switching  pair  r. 

Rs:  Set  of  switching  pairs  that  use  the  VPC  s  as  a  first  or  as  a  second  VPC  of  their  alternate 

routes  ( r  e  Rs,  r:s  €  r(2) ) . 

a,1:  Probability  that  the  alternate  route  1  is  selected  for  the  switching  pair  r. 

l(s):  Alternate  route  that  contains  the  VPC  s. 

l,(s):  First  VPC  of  the  alternate  route  l(s). 
l2(s):  Second  VPC  of  the  alternate  route  l(s). 

p\\  :  First-choice  (direct),  Poisson  traffic  offered  to  the  VPC  s,  by  the  service-class  ck. 

pf :  Alternate  traffic  (assumed  as  Poisson  traffic)  offered  to  the  VPC  s,  by  the  service-class  ck 

B))s :  CBP  for  the  first-choice  traffic  offered  to  the  VPC  s,  by  the  service-class  ck. 

B)ls :  CBP  for  the  alternate  traffic  offered  to  the  VPC  s,  by  the  service-class  ck. 

The  alternate  traffic  of  each  service-class  ck  offered  to  the  VPC  s,  is  determined  as: 

r%=  X  s'  er(2)-s,  k=l,2  (1) 

nse(2) 

After  a  long-run  time,  since  the  selection  of  alternate  routes  is  uniform  and  the  blocking  rates 
over  the  two  VPCs  of  an  alternate  route  are  equalized,  the  Selection  Probability,  a,113',  of  an 
alternate  route  results  to  be  inverse  proportional  to  the  blocking  of  the  alternate  route: 


a'r(s) 


and  ^a^  =  1,  k  eRr 

k 


OC 


( 1  -  ( 1  -  B2,1(s))(  1  '  B2,12(s))) 


(2) 
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For  the  determination  of  the  CBPs  of  each  VPC  of  the  network,  we  consider  only  the  Call- 
level  characteristics  of  the  service-classes.  We  propose  the  recursive  formula  found  in 
references  (Kaufman  1981,  Roberts  1 982)  to  be  used  for  the  determination  of  CBPs,  taking  into 
account  the  bandwidth  reservation  control  between  the  service-classes.  As  it  has  been  observed 
(Logothetis,  1 992),  this  formula  has  a  high  accuracy  especially  when  the  service-classes  have  the 
same  mean  service-time.  To  apply  this  formula  to  the  DAR,  we  have  to  consider  that  four  traffic 
streams,  tg  (k=l,2,3,4),  are  offered  to  each  VPC.  The  traffic  streams  t,  and  t3  are  due  to  the  first 
and  the  alternate  offered  traffic  of  the  first  service-class,  respectively,  whereas  the  t2  and  t4  are 
due  to  the  first  and  the  alternate  offered  traffic  of  the  second  service-class,  respectively. 

The  CBPs  of  the  VPCs  are  determined  as: 


j  b,k+R(tk)*l 

Btk  =  -  X  G(V,-n)  (3) 

G  „=i 

where 

vs 

G  =  XG(i)  (4) 

i=l 

1  4 

G(i)  =  t  X r,k D'i 0 " b(k)G(i - btk)  fori  =  l,..,Vs  (5) 

1  k=l 


D,k(i-btk) 


b,t  for  i  <  Vs-  R(tk) 
0  for  i  >  Vs-R(tk) 


(6) 


R(tk)  is  the  bandwidth  reserved  for  each  traffic  stream  due  to  the  Bandwidth  and  the  Trunk 
Reservation  Control  (Figure  3). 

In  this  way,  we  have  formulated  in  the  ATM  environment  a  system  of  equations  (1-6)  which 
is  solved  by  an  iterative  method  in  the  computer.  This  system  is  equivalent  to  the  fixed-point 
system  of  equations  which  is  valid  for  the  STM  environment. 


4.2  DYMAMICALLY  CONTROLLED  ROUTING 

OCR  is  an  example  of  a  centralized  routing  control  scheme.  It  uses  a  central  processor  to  find  an 
alternate  route  (composed  by  two  VPCs)  for  each  switching  pair  of  the  network,  based  on  the 
free  capacity  of  the  VPCs  of  the  whole  network.  The  central  processor: 

•  gathers,  during  its  control  interval,  all  the  appropriate  information  (VPC  trunk  status, 
traffic, etc.),  from  each  ATM-SW, 

•  calculates,  for  each  switching  pair,  the  alternate  route  selection  probability  which  is 
proportional  to  the  measured  idle  capacity  of  the  alternate  route  set, 

•  selects  the  alternate  route  based  on  the  selection  probability, 

•  sends  the  alternate  route  information  to  the  ATM-SWs. 

The  control  (update)  interval  of  DCR  is  in  the  order  of  a  few  seconds,  whereas  the  theoretical 
case  of  a  zero  control  interval  can  be  considered  as  well. 
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Figure  3  Bandwidth  and  Trunk  Reservation  in  a  VPC. 

Determination  of  Call  Blocking  Probability 

For  the  determination  of  CBP  in  an  ATM  network  controlled  by  the  DCR,  the  same  notations 
with  the  DAR  system  are  used.  In  addition  to  them  the  following  notations  are  used: 

Ls:  Residual  capacity  of  the  VPC  s. 

Cs:  Occupied  bandwidth  of  the  VPC  s. 

Ts:  Trunk  Reservation  number  of  the  VPC  s. 

Lm  :  First  VPC  of  the  alternate  route  1  €  Rr . 

L,2:  Second  VPC  of  the  alternate  route  1  e  Rr . 

The  DCR  control  solves  the  same  system  of  equations,  as  the  DAR  controller,  under 
stationary  traffic  conditions  (Girard,  1990).  However,  in  the  DCR,  the  Selection  Probabilities  of 
the  alternate  routes  are  computed,  for  zero  update  interval,  as  follows: 

Firstly,  the  residual  capacity  Ls  of  VPC  s,  is  computed  as: 


Ls  =  Vs-Cs-Ts 


(7) 


The  Cs  is  calculated  as  the  total  traffic  carried  on  the  VPC  s: 


Cs=2>m  (1-B5)  +  (1  -  BVs))  bCk 


(8) 


The  residual  capacity  of  the  alternate  route  of  two  VPCs  is  computed  as: 


L,  =  min(Ln,L|2) 


(9) 


and  the  Selection  Probability  of  the  alternate  route  is  given  as: 


(10) 
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4.3  Comparison  of  the  VCR  control  schemes 

Two  reservation  parameters,  the  Bandwidth  and  Trunk  Reservation  numbers,  must  be 
considered  for  a  VCR  control  scheme  in  order  to  improve  the  performance  of  multi-service 
networks,  such  as  ATM  networks.  Bandwidth  Reservation  aims  at  guaranteeing  the  QOS  of 
each  service-class  multiplexed  in  a  VP,  by  reserving  some  fraction  of  the  VP  bandwidth  for  the 
service-classes  which  require  larger  bandwidth.  So,  calls  of  service-class  ck  are  refused  to  be 
connected  when  less  than  t(ck)  bandwidth  is  available  in  the  VP.  By  a  proper  selection  of  the 
Bandwidth  Reservation  number  the  resultant  CBP  of  the  two  service-classes,  in  each  VP,  can  be 
equalized.  On  the  other  hand.  Trunk  Reservation  aims  at  guaranteeing  the  network  stability 
when  an  alternate  routing  scheme  is  applied.  It  protects  the  first  offered  traffic  to  a  VP  against 
alternate  routed  traffic  which  makes  use  of  this  VP.  It  depends  on  VP  bandwidth  and  traffic  load 
offered  to  the  VPs. 


Table  1 


Trunk 

Reservation 

Number 

Maximum  Traffic  Fluctuation  (%) 

10  20  30  40  50  60 

70 

80 

0 

1.91 

2.59 

3.50 

3.47 

4.49 

5.23 

5.77 

6.14 

1.72 

1.62 

2.46 

3.31 

4.22 

4.85 

5.27 

5.99 

24 

1.07 

1.12 

1.43 

1.64 

1.98 

2.39 

2.86 

3.31 

0.72 

0.90 

1.17 

1.74 

1.95 

2.48 

2.85 

3.39 

48 

1.00 

1.10 

1.31 

1.62 

2.03 

2.31 

2.87 

3.20 

0.86 

0.97 

1.34 

1.44 

1.81 

2.17 

2.59 

3.13 

72 

1.22 

1.44 

1.56 

1.87 

2.20 

2.65 

3.09 

3.36 

0.96 

1.07 

1.34 

1.65 

2.11 

2.45 

2.88 

3.12 

In  Table  I,  the  average  CBP  of  an  ATM  network  (described  below)  which  operates  with  DAR 
(first  number  in  Table  I)  or  DCR  (second  number  in  Table  I)  VCR  control  schemes,  versus 
Trunk  Reservation  numbers  is  given.  The  same  Trunk  Reservation  number  is  considered  for 
each  VP-link.  The  Bandwidth  Reservation  number  is  such  that  the  CBPs  of  the  two  service- 
classes  are  equalized.  Table  I  shows  that  in  case  of  small  traffic  fluctuation  the  CBP  of  the 
network  increases  as  the  Trunk  Reservation  number  increases.  In  case  of  large  traffic  fluctuation 
a  larger  Trunk  Reservation  number  is  needed. 

The  performance  of  the  two  VCR  control  schemes  described  above,  is  examined  in  the  ATM 
network  of  10  ATM-SWs  (see  below).  The  Trunk  Reservation  number  is  taken  from  Table  I 
and  corresponds  to  the  best  one  for  each  traffic  fluctuation.  Five  versions  of  the  DCR  are 
presented.  The  DCR-0  with  zero  update  interval  and  the  DCR-5,  DCR-10,  DCR  15,  DCR-20  of 
update  interval  5,  10,  15,  20  sec,  respectively.  Figure  4,  shows  the  average  CBP  of  the  whole 
network  operating  with  the  DAR  control  or  the  DCR-0,  DCR-5,  DCR-10,  DCR-15,  and  DCR- 
20  control  schemes  versus  traffic  fluctuations.  The  results  show  that  the  DCR-0  has  the  best 
performance,  while  the  performance  of  DAR  is  better  than  the  DCR-5,  DCR-10,  DCR-15  and 
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DCR-20.  In  practice,  however,  the  DCR-0  cannot  be  applied;  since  this  control  is  a  centralized 
one,  a  control  interval  of  the  order  of  a  few  seconds  is  required,  at  least.  Figure  5  shows  the 
average  CBP  of  the  whole  network  operating  with  DAR  and  DCR  versus  control  interval. 
When  the  control  interval  is  small  the  performance  of  the  DCR  is  better  than  that  of  the  DAR. 


o - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 

0  10  20  30  40  50  60  70  80 

Maximum  Traffic  Fluctuation  (%) 


Figure  4  Average  CBP  versus  maximum  traffic  fluctuation  for  DAR  and  DCR. 
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Figure  5  Average  CBP  versus  control  interval  for  DAR  and  DCR. 
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5  COMPARISON  OF  VPB  CONTROL  WITH  VCR  CONTROL 

The  performance  of  the  VPB  control  and  the  VCR  control  are  compared  under  static  and 
dynamic  traffic  conditions  on  a  10  ATM-SWs  ring  ATM  network.  Under  static  traffic 
conditions  the  average  and  the  worst  CBP  of  the  network  are  presented.  Under  dynamic  traffic 
conditions,  the  response  time  of  the  two  traffic  controls  is  examined. 

Two  service-classes  are  accommodated  in  the  network.  The  required  bandwidth  per  VCC  for 
the  first  service-class  is  64  kbps  (considered  as  bandwidth  unit  or  one  trunk  capacity),  and  for 
the  second  service-class  is  1.536  Mbps  (i.e.  24  bandwidth  units).  Because  of  the  Bandwidth 
Reservation  Control,  1.472  Mbps  (23  bandwidth  units)  are  reserved  to  benefit  the  second 
service-class,  in  each  VP.  The  Trunk  Reservation  number  is  taken  from  Table  I  and  corresponds 
to  the  best  one  for  each  traffic  fluctuation.  Both  service-classes  have  exponentially  distributed 
holding  times  with  mean  value  of  1 00  sec. 

The  VPs  of  the  network  are  dimensioned  so  as  to  satisfy  the  grade-of-service  of  3%  (CBP). 
The  traffic  offered  to  each  ATM-SW  are  260  Erl  and  12  Erl  for  the  first  and  the  second  service- 
class,  respectively.  The  VP  bandwidth  is  43.008  Mbps  (672  bandwidth  units),  for  each  VP.  The 
bandwidth  of  a  transmission  link  (between  two  ATM-SWs)  is  calculated  as  the  sum  of  the  VPs 
that  use  this  transmission  link. 

5.1  Static  Traffic  Conditions 

The  average  and  the  worst  CBP  of  the  whole  network  are  examined  when  the  offered  traffic 
fluctuates  randomly  according  to  the  uniform  distribution  by  a  maximum  of  10%  of  the  design 
traffic-load,  reaching  to  80%  in  steps  of  10%. 

In  Figure  6,  the  worst  CBP  of  the  network  is  shown  versus  the  maximum  traffic  fluctuation, 
when  VPB  control,  VCR  control  and  No-Control  are  applied  to  the  network.  Figure  6a  shows 
the  results  of  No-Control,  DAR  and  VPB  control  comparatively,  whereas  Figure  6b  shows  the 
results  of  No-Control,  DCR  and  VPB  control.  The  resultant  worst  CBP  of  DAR  and  DCR  is 
obtained  through  simulation  (Logothetis,  Kokkinakis,  1995),  while  the  results  of  VPB  control 
are  obtained  analytically  and  are  optimal  (Logothetis  1993,  Logothetis  1995).  As  we  can 
observe,  the  VPB  control  is  more  effective  when  the  maximum  traffic  fluctuation  is  large,  while 
when  the  traffic  fluctuation  is  small  the  VCR  controls  perform  better  than  VPB  control. 

N  Figure  7,  the  average  CBP  of  the  whole  network  is  shown  versus  the  traffic  fluctuations. 
Figure  7a  presents  the  results  of  No-Control,  DCR  and  VPB  control  comparatively,  whereas 
Figure  7b  presents  the  results  of  No-Control,  DAR  and  VPB  control.  When  a  VCR  control  is 
applied,  the  network  performance  in  respect  to  the  average  CBP  is  better  for  all  traffic 
fluctuations.  It  is  worth  mentioning  that  the  objective  of  VPB  control  is  to  minimize  the 
maximum  CBP  ol  the  network,  therefore,  when  this  criterion  is  satisfied,  no  action  is  taken  in 
order  for  the  average  CBP  of  the  network  to  be  improved. 

Figure  8a  and  8b  comparatively  show  the  worst  CBP  and  the  average  CBP  of  the  network 
versus  the  maximum  traffic  fluctuation,  respectively.  The  DAR  and  DCR  curves  of  Figure  6a 
and  6b  are  portrayed  together  in  Figure  8a.  Likewise,  the  DCR  and  DAR  curves  of  Figure  7a 
and  7b  are  portrayed  together  in  Figure  8b. 
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Figure  6a 
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Figure  6b  Worst  CBP  versus  maximum  traffic  fluctuation  for  the  VBP  and  DAR  controls. 
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Figure  7a  Average  CBP  versus  maximum  traffic  fluctuation  for  the  VBP  and  DAR  controls. 
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Figure  7b  Average  CBP  versus  maximum  traffic  fluctuation  for  the  VBP  and  DCR  controls. 
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Figure  8a  Worst  CBP  versus  maximum  traffic  fluctuation  for  the  DAR  and  DCR  controls. 
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Figure  8b  Average  CBP  versus  maximum  traffic  fluctuation  for  the  DAR  and  DCR  controls. 
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5.2  Dynamic  Traffic  Conditions 

Under  dynamic  traffic  conditions,  we  examine  the  response  time  of  the  VPB  and  VCR  controls. 
The  response  time  of  the  controls  is  examined  for  the  theoretical  case  of  a  step  function,  applied 
to  one  ATM-SW  pair  (in  one  traffic-flow  direction).  That  is,  the  traffic  offered  to  one  ATM-SW 
pair  increases  as  a  step  function  by  1 00%  in  both  service-classes. 

First,  a  medium-term  VPB  control  scheme  is  applied  (Logothetis,  Shioda,  1995),  with  a 
control  interval  of  30  min.  That  is,  the  VPB  rearrangement  procedure  starts  every  30  minutes. 
We  assume  that  the  traffic  fluctuation  occurs  at  the  end  of  the  second  control  interval  (i.e.  after 
60  min).  Bandwidth  reservation  of  23  bandwidth  units  is  applied  to  the  first  service-class. 
Second,  the  DAR  control  is  applied  which  is  a  decentralized  control  scheme  governing  each  call 
arrival.  Third,  the  DCR  control  is  applied  with  a  zero  control  interval  (DCR  0)  and,  fourth,  the 
DCR  control  is  applied  again  with  10  sec  control  interval  (DCR  10).  Trunk  reservation  of  48 
bandwidth  units  is  applied  to  benefit  the  first  choice  path  for  all  VCR  controls.  The  CBP  of  each 
ATM-SW  pair  is  measured  every  15  min. 

Figure  9  shows  the  worst  CBP  of  the  network  versus  time.  The  response  time  of  the  VPB 
control  is  75  (135-60)  min.  VCR  controls  respond  faster  than  VPB  control  to  absorb  the  traffic 
variation  because  of  their  very  short  control  interval.  The  VPB  control  needs  a  considerably 
larger  control  interval  because  of  the  required  time  for  bandwidth  rearrangement  due  to  the 
existing  call-connections  at  the  time  point  of  bandwidth  rearrangement. 


50 


Figure  9 


Response  time  for  the  VPB  and  VCR  controls. 
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6  CONCLUSION 

Two  traffic  controls,  the  VPB  control  and  the  VCR  control,  are  presented  for  ATM  networks 
and  the  following  comparisons  are  examined: 

a)  The  performance  of  two  VCR  control  schemes,  the  DAR  and  DCR,  is  examined, 
considering  several  trunk  reservation  parameters.  A  larger  trunk  reservation  number  is  needed 
when  the  traffic  variation  among  the  ATM-SW  pairs  of  the  network  increases. 

b)  The  same  comparison  is  carried  out,  considering  various  control  intervals  for  the  DCR 
control  (centralized  control).  A  very  small  control  interval  is  needed  for  the  DCR  control  to 
achieve  a  better  performance  than  the  DAR. 

c)  Under  static  traffic  conditions,  the  VPB  control  is  compared  with  the  VCR  (DAR  and 
DCR)  controls  in  respect  to  their  effectiveness  in  minimizing  the  worst  CBP  of  the  network. 
The  worst  CBP  of  the  network  without  any  control  is  shown.  The  VPB  control  is  more  effective 
than  VCR  control  when  the  traffic  fluctuation  is  large. 

d)  Under  dynamic  traffic  conditions,  the  response  time  of  each  traffic  control  scheme  is 
measured  by  means  of  simulation.  The  VCR  control  has  a  faster  response  time  than  the  VPB 
control.  This  is  due  to  the  considerably  larger  control  interval  required  for  VPB  control. 
Nevertheless,  the  response  time  of  VPB  control  is  satisfactory  if  we  consider  the  network 
resiliency  within  two  control  intervals. 
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Abstract 

We  discuss  simulation  results  concerning  the  performance  of  the  TCP  protocol  when  run¬ 
ning  over  high-speed  ATM  networks.  Two  network  topologies  are  considered:  a  simple 
network  topology,  comprising  just  two  ATM  switches  and  supporting  3  TCP  connections, 
and  a  candidate  Italian  ATM  network  topology  comprising  ten  ATM  switches  and  sup¬ 
porting  6  TCP  connections.  In  all  simulation  scenarios  the  TCP  traffic  is  mixed  with  some 
background  traffic  whose  level  is  taken  as  a  variable  parameter.  Both  the  background  traf¬ 
fic  and  the  TCP  traffic  are  either  unshaped,  or  shaped  according  to  the  GCRA  algorithm. 

The  effect  of  the  background  traffic  on  the  TCP  protocol  performance  is  discussed, 
varying  the  buffering  capacity  within  nodes  as  well  as  the  peak  bit  rate  that  each  TCP 
connection  is  allowed  to  use.  Numerical  results  clearly  show  that  shaping  the  TCP  traffic 
according  to  fixed  parameters  significantly  improves  both  the  goodput  and  the  efficiency  of 
the  TCP  connections  with  respect  to  the  case  in  which  no  traffic  shaping  is  implemented. 
Moreover,  the  performances  achievable  with  an  adaptive  shaping  of  the  TCP  traffic  (using 
a  simplified  version  of  the  ABR  ATM  transfer  capability)  can  be  observed  to  be  extremely 
satisfactory. 


Keywords 

ATM,  simulation,  TCP,  traffic  control,  traffic  shaping,  ABR 


1  INTRODUCTION 

The  evolution  of  the  ATM  standards  and  products  towards  the  LAN  market  clearly  in¬ 
dicates  that  the  first  ATM  networks  will  be  mainly  used  to  transport  data  traffic  for 
business  applications.  Even  in  the  long  run,  however,  data  traffic  is  expected  to  remain  a 
relevant  part  of  the  load  in  ATM  networks.  It  is  thus  very  important  that  the  high-level 
protocols  used  for  the  implementation  of  data  applications  be  carefully  investigated  with 
respect  to  their  adaptability  to  the  ATM  environment. 

TCP  (Transport  Control  Protocol)  is  today  the  de  facto  standard  transport  protocol 
for  data  applications  in  the  LAN,  MAN  and  WAN  areas.  Many  experts  believe  that  TCP 
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for  a  long  time  to  come  will  remain  the  most  frequently  used  transport  protocol  in  the 
ATM  environment,  even  if  it  has  been  recognized  that  TCP  is  not  specifically  tailored  to 
high  bandwidth-delay  product  networks. 

Some  studies  of  the  behaviour  and  performance  of  TCP  when  used  in  ATM  networks 
already  appeared  in  the  literature  (Romanow  1994,  Meempat  1994,  Bianco  1994,  Ajmone 
19952,  Perloff  1995).  Our  work  concentrates  on  the  effect  that  the  heterogeneous  traffic 
present  in  the  network,  that  we  call  background  traffic,  may  have  on  the  TCP  performance. 
The  importance  of  the  presence  of  background  traffic  goes  beyond  the  reduction  of  the 
bandwidth  available  to  TCP,  since  background  traffic  interferes  with  the  TCP  behavior 
by  altering  the  probability  of  cell  losses  within  node  buffers.  Moreover,  we  also  investigate 
the  influence  of  “traffic  shaping”  on  the  TCP  performance.  Shaping  the  TCP  traffic  at  the 
network  ingress  may  be  a  reasonable  approach  to  allow  the  network  to  control  the  TCP 
source  rate,  without  requiring  a  substantial  rewriting  of  the  TCP  code  itself.  Note  however 
that  a  negotiation  phase  between  the  user  and  the  network  is  necessary  in  order  to  agree 
on  a  given  peak  cell  transmission  rate;  this  rate  will  limit  the  throughput  of  the  TCP 
connection,  even  during  periods  of  low  network  load,  when  the  throughput  achievable  by 
the  TCP  connection  could  be  higher.  A  possible  solution  to  this  drawback  is  the  use  of 
shaping  devices  that  can  adapt  the  peak  cell  transmission  rate  of  a  TCP  source  according 
to  feedback  signals  conveyed  by  the  network.  Such  a  solution  was  foreseen  by  the  ATM 
Forum  within  the  ABR  (Available  Bit  Rate)  ATM  transfer  capability  (ATM  Forum  1995). 
We  investigate  the  viability  of  this  solution  by  studying  the  effectiveness  of  a  simplified 
version  of  ABR. 


2  PERFORMANCE  RESULTS 

The  results  presented  in  this  paper  are  obtained  via  simulation  with  CLASS,  an  ATM 
network  simulator  recently  developed  at  Politecnico  di  Torino  (Ajmone  19951).  To  obtain 
a  model  for  the  TCP  protocol,  we  adapted  the  officially  distributed  C  code  of  the  BSD  4.3- 
reno  release  (Jacobson  1990),  without  considering  the  delayed  and  selective  ACK  options 
(f°i  details  see  Ajmone  1995  ).  The  simulation  software  was  validated  by  comparison  with 
measurements  performed  on  an  experimental  ATM  LAN;  furthermore,  an  approximate 
analytical  model  is  being  developed  for  a  simple  network  configuration. 

In  all  the  simulation  scenarios  that  we  considered,  TCP  connections  are  supposed  to 
perform  a  long  file  transfer  from  a  TCP  transmitter  to  a  TCP  receiver:  the  TCP  trans¬ 
mitter  sends  only  data  segments,  and  the  TCP  receiver  returns  only  ACK  segments.  TCP 
sources  operate  in  sustained  overload:  segments  are  always  ready  at  the  transmitter  when 
an  ACK  is  received.  The  size  of  the  buffers  at  the  TCP  transmitters  is  set  to  a  large  value 
that  avoids  any  loss  at  the  source  during  the  fragmentation  process  of  a  TCP  segment 
into  ATM  cells.  The  TCP  receivers  are  assumed  to  be  fast  enough  and  to  have  enough 
buffer  space  so  as  to  avoid  losses.  The  maximum  window  size  is  set  to  a  value  that  allows  a 
single  TCP  transmitter  to  obtain  the  full  available  bandwidth  on  the  link.  It  is  supposed 
that  the  TCP  protocol  always  transmits  segments  of  9140  bytes  (9180  bytes  including 
IP  and  TCP  overhead),  the  suggested  maximum  segment  size  for  TCP  over  ATM;  TCP 
segments  are  divided  in  cells  by  the  AAL5  sublayer  (requiring  the  addition  of  8  overhead 
bytes). 

The  background  traffic  messages  are  generated  according  to  a  Poisson  process,  with  a 
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truncated  geometric  message  length  distribution  with  mean  equal  to  20  cells  and  maximum 
length  200  cells;  the  background  traffic  is  segmented  according  to  the  AAL3/4  sublayer. 

The  burstiness  of  both  the  TCP  connections  and  the  background  traffic  can  be  con¬ 
trolled  with  a  shaping  device  that  operates  according  to  an  adaptation  of  the  GCRA 
(Generic  Cell  Rate  Algorithm)  recommended  by  ITU-T  for  traffic  policing  in  ATM  net¬ 
works  (ITU-T  1992). 

A  GCRA  shaper  is  based  on  the  control  of  the  cell  interdeparture  time  by  delaying  cells 
that  are  scheduled  for  transmission  too  early.  The  basic  parameters  of  the  GCRA  shaper 
are  the  bandwidth  allocation  factor  (3,  which  is  the  amount  of  bandwidth  allocated  to  the 
connection  relative  to  its  mean  bandwidth,  and  the  cell  delay  variation  tolerance  r  which 
is  the  amount  of  time  that  a  cell  is  allowed  to  “accelerate”  with  respect  to  its  expected 
arrival  time.  When  the  background  traffic  is  shaped,  we  assign  to  each  connection  /?  =  1.2 
and  r  =  0  in  the  case  of  the  simple  2-node  network,  while  in  the  Italian  network  the 
bandwidth  allocation  factor  is  (3  —  1.5. 

Numerical  results  are  presented  as  curves  referring  to  two  performance  indices: 

•  the  useful  throughput,  called  goodput ,  at  the  TCP  receivers,  obtained  considering  the 
received  data,  but  discarding  all  the  faulty  and  the  retransmitted  segments; 

•  the  efficiency  of  the  TCP  connections,  i.e.,  the  ratio  between  the  goodput  and  the  total 
offered  load  of  TCP  connections. 

Curves  are  plotted  as  functions  of  the  background  traffic  load,  expressed  in  Mbit/s  of 
user  data;  the  background  load  on  the  link  can  be  computed  multiplying  the  abscissa 
values  by  53/44.  The  TCP  goodput  is  instead  expressed  in  Mbit/s  of  user  data  for  unifor¬ 
mity  with  what  is  generally  done  in  literature,  considering  the  whole  9180  byte  segments. 
Thus,  in  order  to  obtain  the  link  utilization,  the  TCP  goodput  must  be  divided  by  the 
efficiency  and  multiplied  by  a  factor  53/48  (AAL5  is  used)  and  added  to  the  background 
load.  The  background  traffic  is  formed  by  9  different  connections  (in  addition,  an  identical 
background  traffic  flows  on  the  backward  link). 

Simulations  were  either  run  until  the  receiver  throughput  reached  a  98%  precision  with 
95%  confidence,  or  stopped  after  about  one  minute  of  simulated  time.  However,  with  the 
exception  of  the  case  when  ABR-like  services  are  simulated,  the  2-node  network  with 
100  Mbit/s  background  traffic  was  so  overloaded  that  one  of  the  TCP  connections  was 
forced  to  close  by  the  backoff  mechanism,  a  symptom  that  the  network  is  not  working 
properly.  For  this  reason  the  results  of  the  simulation  runs  with  a  background  traffic  load 
equal  to  100  Mbit/s  must  be  interpreted  very  carefully. 

2.1  The  two- node  network 

We  first  consider  a  very  simple  network,  whose  topology  is  sketched  in  Fig.  1,  and  com¬ 
prises  only  two  ATM  switches.  The  data  rate  on  each  channel  is  150  Mbit/s,  and  channel 
L0,  linking  the  two  ATM  switches,  is  the  system  bottleneck.  Three  TCP  connections  share 
the  network  resources  with  a  variable  amount  of  background  traffic. 

The  performance  of  this  very  simple  ATM  network  was  studied  in  detail  in  (Ajmone 
19952)  as  a  function  of  three  variables:  the  TCP  connection  length  (the  network  span), 
the  background  traffic  load  and  characteristics,  and  the  TCP  traffic  shaping  parameters. 
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Figure  1  The  simulated  two-node  ATM  network 


We  only  report  here  some  figures  that  help  the  reader  understand  the  novel  results  and 
also  serve  as  a  reference  when  considering  more  complex  scenarios. 

The  results  presented  in  Fig.  2,  are  obtained  when  all  the  links  L0,L1,L2,L3  in  Fig.  1 
have  length  500  km,  while  the  links  from  the  second  ATM  switch  to  the  TCP  receivers 
are  assumed  to  have  negligible  length.  The  TCP  connections  length  is  thus  1000  km.  The 
results  are  presented  as  a  function  of  the  background  traffic  load,  for  two  values  of  the  node 
buffer  size  in  front  of  the  congested  link  L0:  1000  and  5000  cells,  with  shaped  background 
traffic.  As  expected,  when  no  shaping  is  performed  on  the  TCP  traffic  (dotted  lines  with 
square  markers),  the  TCP  goodput  steadily  decreases  with  increasing  background  traffic; 
on  the  contrary,  an  increase  in  the  node  buffer  size  results  in  an  increase  of  the  TCP 
goodput,  even  if  this  increase  is  not  very  significant,  as  can  be  observed  comparing  the 
lines  with  the  square  markers  in  the  left-hand  side  figures. 

The  results  when  the  traffic  on  TCP  connections  is  shaped  are  presented  on  the  same 
charts  with  the  plus  and  diamond  markers  for  the  cases  of  50  and  15  Mbit/s  shaping, 
respectively,  assuming  a  cell  delay  variation  tolerance  r  =  0.  These  shaping  values  corre¬ 
spond  to  1/3  and  1/10  of  the  bottleneck  link  capacity.  The  results  are  rather  interesting, 
showing  that  smoothing  the  burstiness  of  the  traffic  offered  to  the  network  allows  TCP 
connections  to  better  exploit  the  available  resources.  In  particular,  when  a  50  Mbit/s 
shaping  is  enforced  on  TCP  connections  and  no  background  traffic  is  present,  the  TCP 
connections  completely  saturate  the  link  capacity,  since  they  grab  149.4  out  of  150  avail¬ 
able  Mbit/s,  while  without  shaping  the  goodput  does  not  exceed  83  Mbit/s,  for  5000-cell 
node  buffers.  In  any  case,  the  goodput  achieved  with  a  50  Mbit/s  shaping  is  always  greater 
than  the  unshaped  goodput,  regardless  of  the  node  buffer  size  and  the  background  load. 

The  situation  is  slightly  different  when  we  analyze  the  curves  with  15  Mbit/s  shaping. 
In  this  case  the  TCP  goodput  is  limited  by  the  shaping  function,  not  by  the  window 
mechanism,  and  it  remains  constant  until  the  background  load  is  increased  to  75  Mbit/s. 
In  this  case,  for  high  background  traffic  load,  the  goodput  is  greater  than  the  one  obtained 
in  the  cases  without  shaping  and  with  50  Mbit/s  shaping.  It  is  interesting  to  notice  that 
in  the  case  of  100  Mbit/s  background  traffic  load,  when  the  network  is  clearly  overloaded, 
the  performance  of  the  TCP  connections  is  basically  the  same  for  the  three  cases  that 
were  considered. 

More  insight  can  be  achieved  by  looking  at  the  efficiency  of  the  TCP  protocol  (the 
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Figure  2  Average  goodput  and  efficiency  of  the  TCP  connections  for  the  simulated  two- 
node  network  with  1000  km  connections,  as  a  function  of  the  node  buffer  size  and  the 
background  load;  the  background  traffic  is  shaped 


charts  on  the  right-hand  side  of  Fig.  2).  These  curves  clearly  show  that  as  soon  as  the 
total  traffic  offered  to  the  network  exceeds  the  bottleneck  link  capacity,  the  efficiency 
of  the  TCP  protocol  becomes  very  poor,  dropping  to  0.5  or  even  less.  Moreover,  the 
more  bursty  is  the  traffic,  the  poorer  is  the  efficiency.  This  result  is  also  confirmed  by 
simulation  runs  without  shaping  of  the  background  traffic,  where  the  TCP  performance 
(not  reported  in  the  graphs)  is  even  poorer.  Indeed,  the  only  acceptable,  even  amazingly 
good,  situation  is  the  one  with  15  Mbit/s  shaping,  whose  efficiency  remains  equal  to  1  (no 
segment  loss  was  recorded)  up  to  a  background  traffic  load  equal  to  75  Mbit/s;  when  the 
background  load  reaches  100  Mbit/s,  the  network  is,  as  already  stated,  overloaded  in  all 
cases.  It  is  interesting  to  notice  the  fact  that  with  node  buffer  size  equal  to  1000  cells,  the 
efficiency  of  TCP  without  shaping  and  with  50  Mbit/s  shaping  seems  to  increase  slightly 
for  background  traffic  load  50  and  75  Mbit/s  after  dropping  to  about  0.5  for  background 
traffic  load  50  Mbit/s.  This  phenomenon  might  be  due  to  statistical  fluctuations,  but 
we  believe  that  it  is  more  probably  due  to  phasing  phenomena  like  those  examined  in 
(Romanov  1994,  Bianco  1994). 

The  second  set  of  results  that  we  discuss  considers  TCP  connections  with  different 
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Buff.  5000  cells  -  No  shaping  -  L0=500 


Buff.  5000  cells  -  No  shaping  -  L0=500 


Figure  3  Goodput  and  efficiency  of  the  TCP  connections  for  the  two-node  network  with 
1000,  550,  and  505  km  connections,  as  a  function  of  the  shaped  background  traffic  load, 
with  buffer  size  equal  to  5000  cells 


lengths.  This  situation  may  be  very  common  in  reality,  and  it  deserves  investigation, 
since  the  TCP  control  mechanism  is  known  to  be  biased  against  longer  connections.  In 
this  scenario  the  goodput  of  each  connection  is  separately  taken  into  account  and  plotted. 
With  respect  to  the  simulation  scenario  presented  in  Fig.  1,  the  bottleneck  link  length  L0 
is  set  to  500  km,  while  the  lengths  of  the  links  L\,  L2  and  L3  are  set  respectively  to  5, 
50  and  500  km,  resulting  in  connections  whose  lengths  are  505,  550  and  1000  km.  Fig.  3 
reports  the  results  for  buffer  size  equal  to  5000  cells,  when  the  TCP  connections  are  either 
unshaped  or  shaped  at  50  Mbit/s.  The  shaping  at  15  Mbit/s  is  not  reported  for  the  sake 
of  brevity  since  all  of  the  connections  obtain  exactly  the  same  goodput. 

When  the  TCP  connections  are  unshaped,  it  can  be  noticed  that  the  goodput  obtained 
by  the  connections  is  inversely  proportional  to  the  connection  length,  as  expected,  since 
the  TCP  throughput  is  roughly  inversely  proportional  to  the  round  trip  delay.  The  unfair 
behavior  is  clear,  and  it  can  be  remedied  by  adopting  a  50  Mbit/s  shaping.  In  this  case  the 
goodput  difference  between  the  connections  of  length  505  and  550  km  is  negligible,  while 
the  connection  with  length  1000  km  still  gets  a  lower  bandwidth,  but  with  low  background 
traffic  loads  this  difference  becomes  less  significant. 
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Figure  4  Goodput  and  efficiency  of  the  TCP  connections  for  the  two-node  network  with 
three  100  km  connections,  as  a  function  of  the  assigned  bandwidth,  with  1000-cell  buffers; 
the  background  traffic  load  is  set  to  either  15  or  45  Mbit/s 


It  is  interesting  to  notice  that  the  efficiency  of  the  connection  is  independent  from  the 
connection  length,  even  if  no  shaping  is  performed  on  the  connections.  This  means  that 
the  losses  due  to  buffer  overflow  are  roughly  proportional  to  the  bandwidth  grabbed  by 
the  connection. 

Let  us  now  consider  what  happens  if  we  draw  the  results  as  a  function  of  the  bandwidth 
assigned  to  each  TCP  connection.  In  this  case,  with  reference  to  Fig.  1,  we  set  all  the  links 
lengths  to  50  km,  thus  simulating  100  km  connections.  The  size  of  the  buffer  in  front  of  the 
congested  link  L0  is  set  to  1000  cells.  Fig.  4  reports  the  results  obtained  in  this  scenario. 
Each  graph  contains  two  pairs  of  curves:  the  first  pair  is  obtained  with  a  background  traffic 
level  equal  to  45  Mbit/s,  while  the  second  one  is  obtained  with  background  traffic  level 
equal  to  15  Mbit/s.  Curves  within  each  pair  refer  either  to  the  case  of  shaped  background 
traffic  or  to  the  case  of  unshaped  background  traffic. 

The  difference  between  the  curves  obtained  by  shaping  the  background  traffic  and  those 
obtained  by  letting  the  background  traffic  remain  unshaped  is  negligible  because  the  buffer 
is  big  enough  to  accomodate  the  background  traffic  bursts.  All  curves  show  the  following 
behaviour:  if  the  assigned  bandwidth  is  such  that  the  link  is  not  overloaded  (assigned 
bandwidth  up  to  30  Mbit/s  with  45  Mbit/s  background,  and  up  to  37.5  Mbit/s  with  15 
Mbit/s  background),  then  the  TCP  connections  efficiency  sticks  to  one  and  hence  the 
average  goodput  increases  linearly.  As  soon  as  the  link  becomes  overloaded,  the  TCP 
efficiency  drops  to  about  0.5  and  the  goodput  decreases. 

The  results  presented  in  Fig.  4  show  that,  at  least  statically,  it  is  possible  to  identify  a 
shaping  rate  that  optimizes  the  throughput  obtained  by  TCP  connections  as  a  function 
of  the  background  load;  this  same  value  also  allows  the  maximum  exploitation  of  the 
network  resources  without  QoS  reduction.  This  consideration  suggests  the  exploitation 
of  adaptive  shaping  algorithms,  like  those  specified  in  the  ABR  ATM  transfer  capability, 
(ATM  Forum  1995)  for  the  transport  of  TCP  connections. 

In  order  to  investigate  the  performance  of  the  TCP  protocol  on  an  ABR-like  transfer 
capability,  a  simplified  version  of  ABR  was  implemented  in  the  CLASS  simulator.  This 
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implementation  follows  the  ATM  Forum  guidelines  (ATM  Forum  1995)  but  considers  only 
the  key  aspects  of  the  algorithms,  neglecting  all  the  details  that  are  believed  not  to  play 
a  key  role  in  determining  the  system  performance.  For  this  reason,  in  order  not  to  create 
confusion,  we  shall  refer  to  this  scenario  as  adaptive  shaping,  rather  than  ABR. 

The  main  features  of  adaptive  shaping  are  as  follows. 

•  The  interaction  between  the  end  users  and  the  network  is  managed  through  special 
RM  (resource  management)  cells  that  are  transmitted  in  band.  RM  cells  convey  only  a 
ternary  feedback  to  sources:  either  Increase  Rate  (IR),  or  Keep  Rate  (KR),  or  Decrease 
Rate  (DR).  This  ternary  feedback  is  intended  to  guide  the  behavior  of  the  source  shaper. 

•  TCP  sources  shape  their  traffic,  and  introduce  in  their  cell  flow  one  RM  cell  every  A^rm 
information  cells;  the  feedback  of  the  RM  cell  is  always  initialized  to  IR  by  the  source. 

•  TCP  receivers  route  RM  cells  back  toward  their  corresponding  sources,  without  chang¬ 
ing  the  feedback  carried  in  the  RM  cells. 

•  ATM  switches  monitor  the  traffic  on  the  forward  link,  trying  to  identify  any  congestion 
situation.  However,  switches  can  modify  the  feedback  in  RM  cells  only  when  these 
reach  the  switch  in  their  backward  trip,  while  returning  to  the  source  (this  is  done  in 
order  to  reduce  the  distance  and  hence  the  delay  between  the  control  point  and  the 
source).  The  feedback  in  an  RM  cell  can  be  modified  only  from  IR  to  either  KR  or  DR, 
or  from  KR  to  DR.  This  is  done  in  order  to  avoid  the  danger  that  nodes  closer  to  the 
source  set  the  feedback  to  more  optimistic  values  than  nodes  farther  away,  that  may 
be  experiencing  congestion. 

•  ATM  switches  determine  their  congestion  state  by  monitoring  the  occupancy  of  the 
buffer  associated  with  the  link  on  which  forward  RM  cells  are  routed.  Congestion  is 
determined  using  two  thresholds  Tt  and  Th.  If  the  buffer  is  filled  below  Tt  then  the 
switch  does  not  modify  the  feedback  carried  in  RM  cells,  that  thus  keeps  its  current 
value  (IR  if  not  previously  reduced  by  other  nodes);  if  the  buffer  is  filled  above  Th  then 
the  node  sets  the  feedback  in  RM  cells  to  DR;  if  the  buffer  is  filled  between  the  two 
thresholds  then  the  feedback  value  is  set  to  KR  (unless  it  was  already  set  to  DR,  in 
which  case  it  remains  DR). 

•  Source  shaping  devices  always  follow  the  indication  that  is  contained  within  an  RM 
cell;  the  time  needed  to  adapt  the  rate  is  negligible.  The  transmission  rate  RT  can 

C 

only  be  set  to  a  value  that  divides  the  capacity  of  the  link,  i.e.,  Rj  =  —  where  C  is 

Ns 

the  capacity  of  the  link  and  Ns  is  an  integer.  When  a  source  shaping  device  receives 
an  RM  cell  with  a  DR  feedback,  the  value  of  Ns  is  increased  by  1;  instead,  when  an 
RM  cell  carries  an  IR  feedback,  the  value  of  Ns  is  decreased  by  1.  This  introduces 
a  quantization  effect  that  in  some  cases,  especially  when  the  bandwidth  requirements 
of  the  connections  are  high,  may  affect  the  performance  of  the  system,  introducing 
oscillations  in  the  transmission  speeds. 

®  Switches  aie  able  to  enforce  fairness  in  the  partition  of  the  bandwidth  among  connec¬ 
tions. 

Numerical  results  were  obtained  with  NRM  =  32  (one  RM  cell  every  32  information 
cells),  and  by  setting  the  two  thresholds  T,  and  Th  to  1%,  and  50%,  respectively,  of  the 
switch  buffer  size. 

Fig.  5  repoits  the  results  obtained  with  adaptive  shaping  in  the  two-node  network  see- 
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Figure  5  Goodput  and  efficiency  of  the  TCP  connections  for  the  two-node  network  with 
adaptive  shaping,  with  1000  km  connections,  as  a  function  of  the  background  load,  with 
buffer  size  equal  either  to  1000  or  to  5000  cells 


nario  with  1000  km  connections.  These  results  are  to  be  compared  with  those  reported 
in  Fig.  2;  however,  here  throughput  and  efficiency  are  plotted  separately  for  each  TCP 
connection.  The  dotted  straight  line  represents  the  available  free  bandwidth  for  each  con¬ 
nection  (obtained  by  subtracting  the  background  and  RM  traffic  loads  from  the  available 
data  rate,  dividing  the  result  by  the  number  of  TCP  connections,  and  multiplying  by 
48/53  to  account  for  the  ATM  cell  overhead).  The  maximum  allowed  transmission  rate 
is  50  Mbit/s  for  each  TCP  connection.  Observe  that  the  scale  of  the  efficiency  plots  is 
greatly  magnified  with  respect  to  the  one  in  Fig.  2.  The  performance  improvements  that 
can  be  obtained  with  adaptive  shaping  are  quite  remarkable,  and  the  great  increase  in 
efficiency  must  be  noted  in  particular.  A  further  important  consideration  is  that  the  per¬ 
formance  is  now  much  better  with  5000  cell  buffers  than  with  1000  cell  buffers:  the  reason 
for  this  difference  is  that  these  buffers  must  be  large  enough  to  absorb  the  transient  phase 
between  the  congestion  detection  and  the  transmitter  adaptation,  whose  duration  is  pro¬ 
portional  to  the  network  span.  The  efficiency  in  the  case  of  1000  cell  buffers  is  affected 
by  the  already  mentioned  granularity  in  the  shaper  rates.  Indeed,  at  medium  loads  the 
transmission  rate  keeps  oscillating  between  roughly  15  Mbit/s  and  50  Mbit/s;  a  trans- 
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Figure  6  Goodput  and  efficiency  of  the  TCP  connections  for  the  two-node  network  with 
adaptive  shaping,  with  1000,  550,  and  505  km  connections,  as  a  function  of  the  shaped 
background  traffic  load,  with  buffer  size  equal  to  5000  cells 


mission  rate  of  50  Mbit/s  is  clearly  too  much  for  the  network  load,  but  since  the  allowed 
rates  above  25  Mbit/s  are  only  30,  37.5  and  50  Mbit/s,  the  transmission  rate  is  increased 
too  fast,  thus  leading  to  buffer  overflows.  This  phenomenon  is  attenuated  at  high  loads 
because  the  transmission  rates  are  lower  and  the  rate  granularity  becomes  negligible. 

Fig.  6  reports  the  results  obtained  with  adaptive  shaping  when  the  TCP  connections 
have  different  lengths.  This  figure  can  be  compared  with  Fig.  3.  In  this  case  the  advan¬ 
tage  of  a  mechanism  that  allows  the  transmission  rate  to  be  controlled,  and  the  fairness 
among  connections  to  be  enforced,  leads  to  a  striking  performance  improvement:  all  TCP 
connections  have  efficiency  one,  all  of  them  receive  the  same  amount  of  bandwidth  which 
corresponds  to  a  very  high  fraction  of  the  available  bandwidth.  It  is  quite  interesting  to 
notice  that  the  overall  performance  in  this  case  is  better  than  the  one  obtainable  when 
all  TCP  connections  are  1000  km  long.  Since  shorter  connections  are  easier  to  control, 
this  means  that  all  connections  benefit  from  the  presence  of  short  connections:  a  behavior 
which  is  exactly  the  opposite  of  the  one  observed  in  the  case  of  TCP  connections  without 
adaptive  shaping. 

2.2  The  Italian  network 

The  candidate  Italian  network  topology  comprises  ten  ATM  switches,  located  in  the 
major  Italian  cities,  and  is  shown  in  Fig.  7;  the  buffering  capacity  at  all  nodes  is  set  either 
to  100  or  to  1000  cells  per  port,  and  the  user  transmission  buffer  sizes  are  set  to  quite 
large  values  in  order  to  avoid  losses  at  the  source.  Six  TCP  connections  can  be  identi¬ 
fied:  Mi-Ro,  To-Fi,  Ve-To,  Ro-Ba,  Ba-Pa  and  Pa-Ba.  The  total  amount  of  background 
traffic  in  the  network  is  equal  to  0.8,  1.0  and  1.2  Gbit/s,  and  the  traffic  distribution  is 
highly  asymmetric,  the  network  having  essentially  two  hot  spots  in  Rome  and  Milan.  The 
complete  workload  distribution  for  the  background  traffic  is  reported  in  Table  1.  Table  2 
presents  the  background  load,  measured  as  a  percentage  of  the  link  capacity,  of  all  the 
links  crossed  by  the  six  TCP  connections. 
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Figure  7  Topology  of  the  Italian  network 
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Table  1  Traffic  matrix  used  in  the  simulation  of  the  Italian  topology;  the  traffic  is  gen¬ 
erated  by  the  node  in  the  column  and  goes  to  the  node  in  the  row;  the  relations  are 
expressed  in  thousandths  of  the  global  generated  traffic 
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TCP  Conn 

0.8  Gbit/s 

1.0  Gbit/s 

1.2  Gbit/s 

1st 

2nd 

3rd 

1st 
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3rd 

1st 

2nd 

3rd 
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0.41 

0.51 

0.61 

To-Fi 

0.294 

0.114 

0.11 

0.367 

0.143 

0.13 

0.44 

0.165 

0.15 

Ve-To 
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0.30 
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0.377 
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0.43 

Ro-Ba 

0.25 

0.06 

0.32 

0.074 

0.38 

0.088 

Ba-Pa 

0.06 

0.028 

0.074 

0.035 

0.088 

0.042 

Pa-Ba 

0.028 

0.06 

0.035 

0.074 

0.042 

0.088 

Table  2  Measured  load  of  the  background  traffic,  as  a  percentage  of  the  link  capacity, 
on  the  links  crossed  by  the  TCP  connections 


All  the  TCP  connections  have  a  window  size  that  allows  a  transmission  speed  of  up 
to  150  Mbit/s  if  no  shaping  is  applied  by  the  source.  The  Mi-Ro  TCP  connection  is 
carried  on  a  link  with  much  available  capacity,  since  either  60%,  or  50%,  or  40%  of  a  600 
Mbit/s  channel  is  available  for  it,  respectively,  in  the  three  cases  of  background  load.  The 
Ba-Pa  and  Pa-Ba  connections  are  running  on  very  lightly  loaded  links,  while  the  three 
TCP  connections  To-Fi,  Ve-To,  and  Ro-Ba  run  over  150  Mbit/s  channels  whose  loads 
could  significantly  influence  the  TCP  performances.  The  Ro-Ba  and  Pa-Ba  connections 
interfere  with  one  another  on  the  Na-Ba  link. 

We  present  results  for  either  shaped  or  unshaped  background  traffic.  Simulations  were 
run  considering  five  possible  scenarios  for  all  the  TCP  connections:  unshaped  TCP  con¬ 
nections,  TCP  connections  shaped  at  a  link  speed  equal  to  either  25Mbit/s,  or  37.5Mbit/s, 
or  50Mbit/s,  which  means  that  the  TCP  goodput  can  be  at  most  either  22.5,  or  33.8,  or 
45.1  Mbit/s,  and  finally  TCP  connections  with  adaptive  shaping.  All  these  scenarios  are 
simulated  considering  node  buffer  lengths  of  either  100  or  1000  cells. 

Figures  8  and  9  present  the  goodput  and  efficiency  for  the  six  TCP  connections,  as  a 
function  of  the  global  background  network  load;  curves  refer  to  TCP  connections  shaped 
either  at  25  Mbit/s,  or  at  37.5  Mbit/s,  or  at  50  Mbit/s,  or  unshaped,  with  background 
traffic  either  shaped  or  unshaped;  the  node  buffers  are  100  cells  long.  Figures  10  and  11 
reports  the  results  for  the  same  scenarios,  but  with  node  buffer  size  1000  cells. 

The  first  consideration  concerns  the  buffering  capacity  within  ATM  switches.  When 
the  buffers  are  quite  small  (100  cells)  the  burstiness  of  the  background  traffic  has  a  great 
impact  on  the  performance  of  the  TCP  connections.  Instead,  when  the  buffering  capacity 
is  large  enough  to  absorb  the  bursts  of  cells  generated  by  the  background  traffic  sources 
(1000  cells  buffers),  the  influence  on  the  TCP  connections  of  the  burstiness  of  the  sources 
becomes  negligible. 

Let’s  now  consider  each  one  of  the  six  connections  separately,  since  all  of  them  exhibit 
peculiar  behaviours  that  are  worth  discussing. 

The  Mi-Ro  connection  has  a  lot  of  spare  bandwidth  to  exploit;  thus  it  operates  with 
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Figure  8  Average  goodput  and  efficiency  of  the  TCP  connections  for  the  Italian  network 
as  a  function  of  the  background  load,  when  the  node  buffers  sizes  are  100  cells;  solid  lines 
refer  to  shaped  background  traffic,  whereas  dashed  lines  refer  to  unshaped  background 
traffic 
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Figure  9  Average  goodput  and  efficiency  of  the  TCP  connections  for  the  Italian  network 
as  a  function  of  the  background  load,  when  the  node  buffers  sizes  are  100  cells;  solid  lines 
refer  to  shaped  background  traffic,  whereas  dashed  lines  refer  to  unshaped  background 
traffic 
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Figure  10  Average  goodput  and  efficiency  of  the  TCP  connections  for  the  Italian  network 
as  a  function  of  the  background  load,  when  the  node  buffers  sizes  are  1000  cells;  solid  lines 
refer  to  shaped  background  traffic,  whereas  dashed  lines  refer  to  unshaped  background 
traffic 
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Figure  11  Average  goodput  and  efficiency  of  the  TCP  connections  for  the  Italian  network 
as  a  function  of  the  node  buffer  size  and  the  background  load,  when  the  node  buffers  sizes 
are  1000  cells;  solid  lines  refer  to  shaped  background  traffic,  whereas  dashed  lines  refer  to 
unshaped  background  traffic 
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efficiency  one,  grabbing  all  the  resources  it  can,  in  all  case  except  one:  if  neither  the  TCP 
traffic  nor  the  background  traffic  are  shaped  and  the  node  buffers  are  small  (100  cells), 
losses  occur  in  the  node  buffer,  so  that  both  the  TCP  goodput  and  efficiency  significantly 
decrease  with  the  increase  in  the  background  load. 

The  To-Fi  and  Ve-To  connections  have  a  similar  amount  of  available  resources  to  exploit, 
and  they  behave  similarly.  Both  of  them  completely  use  their  assigned  bandwidth  when 
both  their  traffic  and  the  background  traffic  are  shaped,  or  when  the  node  buffers  are  large 
enough  to  absorb  the  background  traffic  burstiness.  On  the  other  hand,  both  connections 
suffer  significantly  when  their  traffic  is  not  shaped,  obtaining  very  poor  efficiency  and  a 
remarkable  reduction  in  goodput.  It  is  important  to  observe  that  the  missing  points  in  the 
graphs,  like  for  instance  those  referring  to  the  To-Fi  connection  without  shaping  of  the 
background  and  TCP  traffics  with  100  cells  buffers,  correspond  to  simulations  where  the 
TCP  connections  were  closed  due  to  the  TCP  backoff  mechanism.  This  behavior  clearly 
shows  that  without  some  kind  of  rate  control  the  TCP  flow  control  mechanism  is  not 
able  to  work  properly  in  high  speed  networks.  The  last  effect  to  be  observed  is  that  when 
the  buffers  size  is  1000  cells  and  the  TCP  traffic  is  not  shaped,  TCP  achieves  better 
performance  if  also  the  background  traffic  is  not  shaped;  this  is  due  to  the  fact  that  when 
the  background  traffic  is  not  shaped,  cells  are  lost  in  bursts,  thus  concentrating  the  losses 
on  a  smaller  number  of  TCP  segments. 

Let’s  now  come  to  the  connections  that  interact  in  Neaples:  Ro-Ba  and  Pa-Ba.  Also  in 
this  case  some  points  are  missing  due  to  the  closure  of  TCP  connections,  and  the  same 
general  considerations  presented  above  apply  here  too.  Moreover,  it  can  be  observed  that 
the  interaction  between  the  two  TCP  connections  on  a  lightly  loaded  link  does  not  seem 
to  jeopardize  performance. 

Finally,  consider  the  Ba-Pa  connection.  This  connection  runs  alone  on  a  lightly  loaded 
set  of  links,  and  its  performance  is  consequently  quite  good.  The  most  interesting  aspect 
to  be  noted  in  this  case  is  what  happens  to  the  TCP  connections  when  no  shaping  is  used, 
and  the  buffer  size  is  increased  from  100  to  1000  cells.  The  goodput  of  the  connection 
increases  significantly  with  the  larger  buffers,  but  the  efficiency  remains  very  poor,  below 
0.8,  and  in  fact  it  is  even  reduced  when  the  buffer  size  increases.  This  phenomenon  once 
again  confirms  that  TCP  by  itself  is  not  suited  to  high  speed  networks,  since  it  wastes  a 
great  amount  of  resources. 

Figure  12  refers  to  the  case  of  adaptive  shaping,  with  the  characteristics  described  in 
the  previous  section,  but  with  no  enforcement  of  fairness  among  connections.  The  circular 
markers  refer  to  the  case  of  100  cell  buffers,  while  the  square  markers  refer  to  the  case 
of  1000  cell  buffers;  black  markers  refer  to  unshaped  background  traffic,  white  markers 
to  shaped  background  traffic.  The  buffer  thresholds  are  set  to  7)  =  5  and  Th  =  50  in 
the  case  of  buffer  size  100,  and  to  Tt  =  10  and  Th  =  500  in  the  case  of  buffer  size  1000. 
The  maximum  transmission  rate  is  75  Mbit/s  for  all  connections,  except  Mi-Ro  that  is 
allowed  to  transmit  up  to  150  Mbit/s.  First  of  all,  it  must  be  noted  that  when  buffers  are 
1000  cells  long  all  connections  attain  efficiency  1,  a  result  which  is  quite  a  success  in  itself; 
moreover,  the  throughput  of  the  connections  is  higher  than  that  obtained  without  shaping 
or  with  fixed  shaping.  Also  in  the  case  with  100  cell  buffers  the  benefits  of  an  adaptive 
shaping  policy  are  quite  evident:  both  the  efficiency  and  the  throughput  aie  higher  than 
without  shaping.  However,  in  this  case  buffer  sizes  are  not  large  enough  to  allow  a  smooth 
control  of  the  sources,  i.e.,  they  cannot  accommodate  all  the  cells  that  are  transmitted  at 
high  speed  by  the  source  before  it  receives  the  RM  cell  with  the  DR  feedback,  hence  the 
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Figure  12  Average  goodput  and  efficiency  of  the  TCP  connections  in  the  case  of  adaptive 
shaping  for  the  Italian  network  as  a  function  of  the  node  buffer  size  and  the  background 
load 
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<  ffi'  y  of  th<-  I  r  I'  corin'-'  tions  in  riot.  always  ore  Notice  that,  in  these  <  oriditions  the 
I  o  !  .  •/,:.:.<■>  •  .or.  ;  still  foi "'I  to  dor/-  wh'-n  the  ha/  k  ground  loa/l  is  1 .2  Gbit/s.  One  more 
,  -  orth  noticing  i»  the  behavioi  of  the  Pa  Ba  connection  with  buffer  size  100  cells, 
o' ■’  -.lisping  of  tji<-  background  traffi'.  In  this  tav-  the  I  hi'  connection  throughput 
when  th<  background  mcreas'-s  J  his  i'-.  due  to  the  fa/ 1  that  for  this  connection 
t  i ,  <■  m  ,<  oottieri'-'k  is  th<-  ;.'a  I  fa  link,  //here  most  of  the  traffi/  is  /Jue  to  I  CP,  while  the 
comp' , ng  'I (  P  conn'/  tion  ali-//  suffers  from  the  quit/-  heavily  loa/ie/1  Ro  Na  link,  hence 
ba/.kgro  re:  traffi'  increases  the  Ro  Ha  connection  is  for/:e/l  to  re/luce  its  rate 
are!  t)  <  Pa  Ha  '/air.'  'tion  'an  exploit  the  ban/J  width  on  the  Na  Ha  link  freed  by  the  Ro- 
Jja  >>>:.:.<■•  ’.or.  I  hr  hows  that  an  adaptive  shaping  r-.  indeed  abb;  to  exploit  dynamically 
t.h'-  a  .a;. ah]'-  oandv/idtli.  Moreover.  it  v-'-rns  that,  if  the  number  of  connections  competing 
for  the  bandwidth  is  small,  an  adaptive  shaping  scheme  may  v/ork  well  even  if  fairness  is 
not  enforced  by  the  network. 


3  CONCLUSIONS 

J  performance  of  the  I  CP  protocol  when  running  over  ATM  networks  was  studied 
through  Sim  nation  in  two  network  scenarios,  considering  the  I  CP  connections  goodput 
arid  efficiency  as  significant  performance  parameters. 

'J  he  variable  parameter-  of  the  study  are  the  background  traffic  load,  the  buffering 
caps/  ;t  the  A' I  M  switches.  th<  traffi'  shaping  parameters  of  both  TCP  and  background 
traffi'  and  tier  length  of  the  I  CP  connections. 

N  ;;‘s  c.ear.v  s.-.ow  tnat.  shaping  the  traffi'  on  the  I  CP  connections  greatly 

improves  the  TCP  performance,  and  also  indicate  that  AHR  like  solutions  may  be  quite 
advantageous. 

A;.  , rr. por ’ a.r. r  a/: var.ta/e  of  shaping  approach  lor  I  CP  connections  is  that  shaping 
toerm.q  ><-s  can  be  applied  v/ith  no  modification  of  the  I  CP  protocol  itself. 
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Abstract 

Admission  control  and  congestion  control  can  provide  traffic  guarantees  in  ATM  networks. 
However  some  users  may  not  be  able  to  describe  their  traffic  accurately  enough  for  the 
network  to  provide  such  guarantees.  By  sending  a  dynamic  feedback  signal  about  the  cur¬ 
rent  utilisation  of  network  resources,  the  network  could  provide  loss  guarantees  to  users 
who  respond  appropriately,  even  without  prior  traffic  descriptors.  One  possible  feedback 
signal  is  a  price  per  unit  of  network  resource,  based  on  the  network  load  level  :  when 
the  load  is  high,  the  price  is  high,  and  when  the  load  is  low,  the  price  is  low  or  zero. 
We  outline  a  distributed  iterative  pricing  algorithm,  and  show  through  simulations  that 
it  can  simultaneously  increase  both  network  and  economic  efficiency.  We  also  explore 
some  arguments  often  raised  against  usage-sensitive  pricing,  and  provide  some  counter¬ 
arguments. 


Keywords 

ATM  Networks,  Pricing,  Dynamic  Feedback,  Congestion  Control 


1  INTRODUCTION 

Asynchronous  Transfer  Mode  (  ATM  )  has  been  adopted  as  the  transfer  mode  for  the 
Broadband  Integrated  Services  Digital  Network  (  BISDN  ),  e.g.  de  Prycker  (1993),  a 
service-independent  network  capable  of  supporting  all  the  communication  services  that 
users  now  require  or  may  require  in  the  future.  ATM  is  also  emerging  as  a  local  area  net- 
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working  technology,  since  it  provides  flexible  bandwidth-on-demand  and  internetworking 
capabilities  for  conventional  data  communications.  ATM  networks  are  therefore  expected 
to  accommodate  a  wide  range  of  users,  including  some  whose  applications  require  guar¬ 
antees  on  cell  loss  and/or  delay.  These  guarantees  could  be  deterministic  worst-case  or 
less  stringent  statistical  guarantees.  Some  users  may  be  satisfied  with  best-effort  service, 
for  which  the  network  offers  no  guarantees  on  loss  or  delay. 

Admission  control  and  congestion  control  can  provide  performance  guarantees  and  are 
therefore  two  of  the  most  important  ATM  network  functions.  In  order  to  obtain  these 
guarantees  from  the  network,  users  have  to  describe  their  traffic  inputs  by  specifying  val¬ 
ues  for  network-defined  traffic  descriptors  such  as  peak  cell  rate  (  PCR  )  or  sustainable 
cell  rate  (  SCR  ).  However  some  users  may  not  be  able  to  describe  their  traffic  accu¬ 
rately  :  because  their  applications  cannot  be  sufficiently  well-characterised  by  the  given 
traffic  descriptors,  or  because  their  actual  traffic  inputs  depend  on  factors  outside  user 
control  (  such  as  the  number  of  active  applications  competing  for  access  to  a  server  ). 
A  common  assumption  in  many  proposed  admission  control  schemes  is  that  traffic  which 
is  not  well-described  cannot  get  specific  guarantees  beyond  the  level  of  service  being 
provided  to  best-effort  traffic. 

The  ATM  Forum  h  as  recognised  the  problem  of  providing  guarantees  to  users  whose 
traffic  cannot  be  well-described,  and  in  response  has  developed  a  specification  for  Avail¬ 
able  Bit  Rate  (  ABR  )  service,  e.g.  Ramakrislman  (1995).  Users  who  choose  ABR  service 
receive  feedback  from  the  network  about  the  current  level  of  network  resource  utilisation, 
and  can  get  cell  loss  guarantees1  if  they  respond  appropriately  -  by  reducing  their  input 
rates  in  times  of  congestion,  for  example. 

ABR  service  is  therefore  suitable  for  users  whose  applications  are  flexible  with  respect 
to  delay  but  not  necessarily  to  loss.  This  flexible  behaviour  represents  a  tool  that  network 
operators  can  use  to  increase  network  utilisation  while  continuing  to  serve  guaranteed 
traffic  such  as  CBR  and  VBR  applications.  In  addition,  this  type  of  network  feedback 
could  modify  an  adaptive  user’s  traffic  at  the  source  rather  than  after  it  has  been  injected 
into  the  network.  This  would  help  to  localise  the  effects  of  feedback  to  the  edges  of  the 
network  and  allow  simpler  internal  network  operation. 

Most  suggestions  for  supporting  ABR  service  assume  that  well-described  traffic  which 
requires  performance  guarantees  gets  priority  in  the  use  of  network  resources  such  as 
bandwidth  or  buffer  space,  and  that  the  remaining  resources  are  fairly  shared  among  the 
ABR  users.  Two  issues  which  are  not  explicitly  addressed  are 

•  why  more  “demanding”  traffic  should  get  priority  over  ABR  traffic; 

•  what  constitutes  “fair”  sharing.  Should  the  available  bandwidth  be  shared  equally 
among  all  ABR  users,  for  instance  ?  Or  should  it  be  shared  according  to  the  various 
application  requirements  ? 

'No  specific  delay  guarantees  can  be  provided,  so  ABR  users  must  be  prepared  to  absorb  delays  at 
the  traffic  source  before  being  allowed  to  input  traffic  into  the  network. 
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It  is  important  to  note  that,  just  because  such  issues  are  not  addressed  explicitly,  does 
not  mean  that  these  suggestions  are  neutral  on  what  are  often  regarded  as  policy  issues. 
On  the  contrary  :  sharing  the  available  bandwidth  equally  among  all  ABR  users  values 
all  such  traffic  equally,  although  the  users  themselves  may  put  widely  differing  values  on 
their  service;  giving  CBR  and  VBR  users  priority  over  ABR  users  ignores  the  possibility 
that  ABR  users  may  value  network  access  more  than  users  with  well-described  traffic 
sources.  We  are  not  saying  that  these  assumptions  are  wrong  or  undesirable,  but  instead 
we  advocate  allowing  the  users  themselves  to  resolve  these  issues. 

Admission  control  and  congestion  control  in  ATM  are  difficult  problems  which  so  far 
have  not  been  satisfactorily  solved.  Two  key  questions  are 

•  how  should  congestion  be  defined  and  measured  ?  This  is  a  difficult  question 
because  individual  user  requirements  vary  considerably,  so  that  one  user  may  think 
the  network  is  congested  while  another  does  not;  and  because  in  internetworks  the 
responsibility  for  detecting  congestion  may  be  distributed  among  several  network 
operators,  each  of  which  applies  a  different  test  at  their  bottleneck  points. 

•  how  should  limited  resources  be  allocated  under  congestion  ?  Some  pro¬ 
posals  call  for  users  to  indicate  the  relative  priority  of  their  traffic  -  leading  to  the 
problem  of  providing  incentives  so  that  all  users  will  not  choose  the  highest  priority. 

Our  aim  in  this  article  is  to  propose  a  dynamic  feedback  control  scheme  which  explicitly 
addresses  these  issues. 


2  DIFFERENT  TYPES  OF  EFFICIENCY 

A  network  is  as  good,  or  as  bad,  as  its  users  perceive  it  to  be.  This  leads  to  the  conclusion 
that  network  performance  should  be  measured  in  terms  of  overall  user  satisfaction  with 
the  service  they  receive.  Network  engineering  measures  (  such  as  average  packet  delay 
or  loss  rate  )  are  inadequate  reflections  of  user  satisfaction  when  user  requirements  vary 
widely. 

Due  to  the  difficulty  in  accounting  for  individual  user’s  requirements,  however,  aggre¬ 
gate  network-oriented  performance  measures  are  usually  used  in  design  and  operations 
problems.  Usage  is  divided  into  classes  according  to  application  requirements  and  traffic 
characteristics;  for  example,  real-time  video,  real-time  audio,  or  ofT-line  file  transfer.  Each 
class  is  regarded  as  having  a  single  representative  user  for  analytical  and  control  purposes, 
and  class  objectives  are  used  to  drive  the  network  control  process.  Therefore  the  loop  is 
not  closed  all  the  way  to  the  users  when  making  operational  decisions. 

We  propose  to  bring  the  users  back  into  the  loop  and  thereby  ensure  that  performance 
measures  are  user-oriented,  as  shown  in  Figure  1 .  A  user-oriented  network  control  scheme 
would  take  user  valuations  into  account  :  the  network  could  serve  higher-value  users 
even  under  congestion  by  temporarily  denying  access  to  lower-value  users.  Such  time- 
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smoothing  would  not  upset  users  who  can  tolerate  longer  delays,  while  it  would  improve 
the  network’s  value  to  users  who  get  greater  benefits  from  immediate  access. 


Figure  1  Network  design  and  control  loops. 


Each  user  privately  decides  how  much  they  value  network  access;  our  scheme  involves 
giving  them  incentives  to  do  this.  Users  would  gain  by  obtaining  service  more  closely 
matched  to  their  needs;  network  operators  would  gain  through  improved  network  utili¬ 
sation  and  increased  user  satisfaction  with  the  service  they  receive.  We  hope  to  achieve 
the  same  (  or  better  )  network  performance  as  with  conventional  congestion  control  and 
resource  allocation  schemes,  while  at  the  same  time  increase  the  total  value  of  the  network 
from  the  users’  point  of  view.  Network  engineering  measures  will  continue  to  be  impor¬ 
tant,  but  we  believe  that  user  preferences  should  be  the  primary  consideration  driving 
resource  allocation  and  congestion  control  schemes. 

We  need  to  distinguish  two  very  different  notions  of  efficiency  : 

•  Network  efficiency  refers  to  the  utilisation  of  network  resources  such  as  bandwidth 
and  buffer  space. 

•  Economic  efficiency  refers  to  the  relative  valuations  the  users  attach  to  their 
network  service. 

If  a  network  can  maintain  an  acceptable  level  of  service  while  minimising  the  resources 
necessary  to  provide  this  service,  we  say  that  its  operation  is  network  efficient.  If  no  user 


Feedback  and  pricing  in  ATM  networks 


201 


currently  receiving  a  particular  Quality  of  Service  (  QOS  )  values  it  less  than  another  user 
who  is  being  denied  that  QOS,  we  say  that  operation  is  economically  efficient. 

An  obvious  question  is,  why  will  either  type  of  efficiency  continue  to  be  important  ? 
Some  observers  have  suggested  that  the  widespread  deployment  of  fibre  optic  lines,  and 
continuing  exponential  decreases  in  processor  and  memory  costs,  will  result  in  these  net¬ 
work  resources  becoming  essentially  “free”  so  that  efficiency  in  their  use  will  not  be 
important  in  the  future,  and  all  users  can  always  be  accommodated.  We  do  not  believe 
these  arguments  apply  in  the  short  or  medium  terms,  if  indeed  they  will  ever  apply.  User 
demands  are  increasing  exponentially,  so  that  it  is  not  clear  when  -  if  ever  -  network 
resources  will  be  “free”.  Experience  suggests  that  application  developers  will  have  no 
difficulty  in  designing  new  services  that  use  up  all  available  resources,  perhaps  after  an 
initial  adjustment  period.  And  market  economics  dictates  that  commercial  network  op¬ 
erators  should  be  aware  of  the  differing  valuations  that  users  attach  to  the  same  level 
of  network  performance.  The  same  considerations  apply  to  privately  owned  or  operated 
networks  :  the  ultimate  goal  will  continue  to  be  to  maximise  some  measure  of  the  value 
of  using  the  network. 


2.1  IMPROVING  EFFICIENCY  WITH  FEEDBACK 

Users  with  flexible  traffic  inputs  can  help  to  increase  network  efficiency  if  they  are  given 
appropriate  feedback  signals.  When  the  network  load  is  high,  the  feedback  should  discour¬ 
age  these  users  from  inputting  traffic;  when  the  load  is  low,  the  feedback  should  encourage 
them  to  send  any  traffic  they  have  ready  to  transmit.  Instead  of  regarding  their  load  as 
fixed,  the  network  uses  the  flexibility  of  these  users  as  part  of  a  congestion  control  and 
avoidance  strategy.  One  possible  feedback  signal  is  a  price  based  on  the  level  of  network 
load  :  when  the  load  is  high,  the  price  is  high,  and  when  the  load  is  low,  the  price  is  low 
or  zero. 

Similarly,  by  associating  a  cost  measure  with  network  loading,  all  users  can  be  signalled 
with  the  prices  necessary  to  recover  the  cost  of  the  current  network  load.  Price-sensitive 
users  -  those  willing  and  able  to  respond  to  dynamic  prices  -  increase  economic  efficiency 
by  choosing  whether  or  not  to  input  traffic  according  to  their  individual  willingness  to  pay 
the  current  price.  Users  who  value  network  service  more  will  choose  to  transmit,  while 
those  who  value  it  less  will  wait  for  a  lower  price. 

Price  signals  thus  have  the  potential  to  increase  both  network  and  economic  efficiency, 
though  whether  a  particular  pricing  scheme  increases  either  notion  of  efficiency  depends 
on  the  implementation.  One  important  point  needs  to  be  clarified  : 

•  contradictory  though  it  sounds,  a  scheme  based  on  pricing  principles  does 
not  necessarily  involve  money.  For  example,  in  a  private  network  where  one 
organisation  controls  all  the  users,  or  in  a  company’s  virtual  private  network,  the 
“prices”  are  simply  control  signals.  In  this  case,  the  users’  applications  could  be 
programmed  to  obtain  a  desirable  traffic  mix,  to  enforce  priorities,  or  to  achieve 
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some  other  objective. 

We  envisage  that  the  charge  to  a  user  in  an  ATM  network  might  have  many  com¬ 
ponents,  such  as  a  connection  fee,  a  charge  per  unit  time  or  per  unit  of  bandwidth, 
premium  charges  for  certain  services,  and  so  on.  We  suggest  that  there  should  also  be 
a  usage-sensitive  component  during  congestion,  to  increase  both  network  and  economic 
efficiency.  We  propose  charging  only  when  network  congestion  indicates  that  some  users 
may  be  experiencing  QOS  degradation,  with  the  size  of  the  charges  related  to  the  degree  of 
congestion.  If  the  network  is  lightly  loaded  and  all  users  are  getting  acceptable  QOS,  the 
usage-sensitive  prices  would  be  zero. 

We  recognise  that  many  people  are  concerned  about  the  use  of  pricing  in  network  opera¬ 
tions.  Concerns  range  from  questions  about  the  feasibility  and  overhead  of  usage-sensitive 
pricing,  to  policy  issues  such  as  profit  opportunities  and  fairness.  We  believe  that  a  clear 
understanding  of  the  nature  of  what  is  being  proposed  is  necessary  on  all  sides.  Therefore 
we  first  outline  our  proposed  dynamic  pricing  scheme  and  some  preliminary  simulation 
results,  and  then  address  some  of  the  objections  often  raised  in  discussions  of  dynamic 
network  pricing. 


2.2  DISTRIBUTED  ITERATIVE  PRICING  ALGORITHM 

It  is  important  to  note  that  our  proposed  pricing  algorithm  would  only  be  applied  to 
adaptive  users,  who  are  able  and  willing  to  respond  to  dynamic  prices  during  a  con¬ 
nection  by  changing  their  offered  traffic.  All  other  users  would  be  charged  according  to 
another  pricing  scheme.  How  to  co-ordinate  the  various  pricing  schemes  to  achieve  some 
overall  objective  (  such  as  fairness  )  is  a  complex  issue  and  we  do  not  address  it  in  this 
article. 

The  network  and  its  users  are  considered  to  form  an  economic  system.  The  system 
has  various  resources  such  as  link  bandwidths  and  buffer  spaces  that  can  be  used  to 
meet  user  demands  for  service.  Network  constraints  such  as  buffer  sizes  or  link  capacities 
are  translated  into  cost  functions  on  the  demands  for  resources.  The  basic  property  of 
these  cost  functions  is  that  marginal  cost  should  go  to  infinity  as  usage  of  the  resource 
approaches  capacity. 

Each  adaptive  user  is  viewed  as  placing  a  benefit,  or  willingness— to-pay,  on  the  resources 
they  are  allocated.  Given  a  price  per  unit  of  bandwidth  or  buffer  space,  a  user’s  benefit 
function  completely  determines  that  user’s  traffic  input.  A  benefit  function  could  follow 
the  usual  economic  assumption  of  diminishing  incremental  benefit  as  more  of  the  resource 
is  consumed,  see  Figure  2(a).  Or  the  user  could  apply  a  threshold  rule,  or  series  of 
threshold  rules,  for  deciding  how  much  of  the  resource  to  request  based  on  the  current 
price,  see  Figure  2(b), (c). 
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Figure  2  Possible  user  benefit  functions. 
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The  network  operator  sets  the  prices  so  that  the  marginal  benefit  the  users  place  on 
their  resource  consumption  is  equal  to  the  marginal  cost  of  handling  the  resulting  traffic 
in  the  network2.  The  network  operator  dynamically  adjusts  the  prices  based  on  current 
network  conditions.  It  turns  out  that  it  is  not  necessary  for  the  network  operator  to  know 
the  user  benefit  functions;  therefore  our  pricing  scheme  is  suitable  for  public  as  well  as 
private  networks. 

Time  is  divided  into  feedback  intervals,  within  each  of  which  the  prices  and  user 
benefits  are  fixed.  This  model  allows  users  to  potentially  change  their  benefit  functions 
every  feedback  interval,  to  reflect  their  satisfaction  with  the  level  of  service  received  or 
their  time  constraints  on  having  their  cells  accepted  into  the  network,  so  the  examples  in 
Figure  2  are  for  a  particular  interval.  Similarly  the  network  re-calculates  the  prices  every 
feedback  interval  to  reflect  current  resource  usage3. 

A  distributed  iterative  pricing  algorithm  for  adaptive  users  has  been  developed,  e.g. 
Murphy  (1994),  Murphy  and  Posner  (1994),  see  Figure  3.  The  computation  required  per 
iteration  at  each  user  and  ATM  access  switch  is  simple,  which  suggests  that  inexpensive 
processing  elements  may  be  sufficient  in  executing  the  algorithm. 

2These  prices  only  address  the  variable  costs  corresponding  to  network  constraints. 

3The  network  and  the  users  may  use  prediction  in  their  decisions  without  invalidating  this  model. 
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Figure  3  Distributed  iterative  pricing  algorithm  for  adaptive  users. 

There  are  several  different  types  of  adaptive  users,  e.g.  Murphy  (1995),  depending  on 
whether  the  user  is  flexible  with  respect  to  loss,  delay,  or  both.  So  far  we  have  modelled 
two  types  of  adaptive  user  : 

•  Inelastic.  This  user  requires  a  delay  bound  on  their  traffic,  but  can  tolerate  sending 
only  a  fraction  of  the  cells  that  are  ready  to  transmit  in  the  current  interval.  We 
assume  that  cells  not  sent  in  the  interval  are  useless  to  the  user  and  are  discarded. 
For  example,  this  might  be  the  second  level  of  a  two-level  video  codec.  The  first 
level  contains  the  minimum  necessary  information,  and  would  be  transmitted  as  a 
non-adaptive  application.  The  second  level  consists  of  enhancement  information.  It 
is  not  essential  that  all  of  the  information  be  delivered;  however,  a  delay  guarantee 
is  required  —  if  the  enhancement  information  does  not  arrive  before  the  playback 
point,  it  is  considered  useless. 

•  Elastic.  This  type  of  user  waits  until  feedback  from  the  network  indicates  that  they 
can  input  traffic,  then  transmits  and  requires  that  their  cells  are  not  lost  in  the 
network.  Each  elastic  user  decides  individually  what  their  transmission  criteria  are, 
e.g.  the  maximum  price  per  cell  they  are  willing  to  pay.  An  example  of  an  elastic 
user  type  would  be  a  non-real-time  data  transfer  with  no  ARQ  capability,  where 
already-transmitted  cells  are  not  buffered  at  the  sender. 
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The  mathematical  models  used  for  inelastic  and  elastic  users  are  described  in  Murphy 
(1996),  along  with  the  equations  governing  their  responses  to  the  dynamic  prices  from  the 
network. 


3  SIMULATION  MODELS  &  RESULTS 

The  simulated  network  is  a  high-speed  ATM  155  Mbps  link  shared  by  inelastic  and 
elastic  users.  Video  sources  are  modelled  as  inelastic  users;  data  sources  are  modelled 
as  elastic  users.  The  link  model  is  shown  in  Figure  4.  The  network  and  source  models 
were  simulated  using  SES /workbench,  e.g.  SES  (1992),  a  discrete-event  simulator  that 
allows  hardware  and  software  simulation.  The  models  were  mainly  created  by  use  of  its 
graphical  user  interface.  SES /  workbench  compiles  the  graphical  code  to  C  and  creates  an 
executable.  The  simulation  execution  platform  was  a  cluster  of  Sparc-10  workstations. 
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Figure  4  Simulation  model  for  economic  efficiency. 

The  simulation  model  is  made  up  of  submodules,  each  of  which  performs  a  well  defined 
function.  The  sources  generate  cells  which  are  input  to  a  network  interface  submodule. 
The  network  interface  takes  the  source  bit  stream  and  forms  ATM  cells.  The  cell  stream 
from  an  interface  is  then  input  to  the  ATM  switch  buffer  submodule.  This  submodule 
smooths  the  arrival  of  cells  to  the  ATM  network  and  so  takes  care  of  cell  scale  congestion. 
The  switch  buffer  is  the  limited  network  resource  in  our  model. 

The  video  source  model  that  we  use  here  is  a  standard  one  for  video  conferencing,  e.g. 
COST  (1993).  The  codec  has  a  compressed  bit  rate  of  2.3  Mbps,  which  adheres  to  the 
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H.261  standard  for  video,  e.g.  Murphy  and  Teahan  (1994).  There  are  20  video  sources, 
each  with  a  mean  of  2.3  Mbps  and  a  peak  of  5  Mbps.  These  are  input  at  a  rate  of  30 
frames  per  second,  all  synchronised  together,  so  that  the  inelastic  users  are  (  more  or 
less  )  stationary  on  the  millisecond  scale. 

The  elastic  users  can  be  thought  of  as  one  user  with  a  lot  of  files  to  transfer  independently, 
many  users  each  with  one  file,  or  some  combination  of  these  types.  The  negotiations  for 
file  transfer  or  connection  set  up  will  only  occur  when  a  new  video  frame  is  to  be  sent,  i.e. 
every  1/30  second,  because  this  simplifies  the  simulation  and  makes  it  possible  to  speed 
up  the  run  time.  Therefore  the  network  renegotiates  the  PCR  every  1  /30  second  with  the 
elastic  users.  An  empirical  distribution  for  file  size  ranges  was  obtained  from  actual  files 
stored  on  one  of  our  computers.  In  the  simulations  a  range  was  chosen  according  to  this 
empirical  distribution,  and  then  a  file  size  was  chosen  from  a  uniform  distribution  within 
this  range.  The  peak-to-mean  ratio  of  this  source  can  be  high  with  values  up  around 
1000.  The  number  of  data  sources  in  use  is  taken  from  a  uniform  distribution  between  1 
and  39  sources.  Each  file  to  be  sent  is  taken  from  a  uniform  distribution  between  20  and 
660  cells.  The  average  bit  rate  of  a  single  data  source  is  therefore  about  4.3  Mbps. 

In  our  proposed  scheme  a  price  is  generated  by  the  network  based  on  the  present  state 
of  the  network  buffer,  and  the  sources  adapt  their  demands  based  on  this  price.  What 
we  propose  and  simulate  adheres  to  the  UNI  3.0  specification  from  the  ATM  Forum,  e.g. 
ATM  Forum  (1993).  A  leaky  bucket  is  also  implemented  on  top  of  our  scheme  so  that  if 
there  is  cell  loss  we  can  discard  the  marked  ones  first. 

The  model  takes  in  cells  over  a  feedback  interval  and  gives  a  price  to  all  the  sources 
sharing  the  link.  The  price  reflects  the  congestion  (  if  any  )  in  the  buffer  and  hence  on  the 
virtual  path.  The  feedback  interval  is  short  compared  to  the  video  frame  time  :  a  value  of 
about  0.05  of  a  frame  time  was  chosen.  To  achieve  feasible  run  times  we  neglect  cell  scale 
effects.  This  neglecting  of  cell  scale  effects  is  critical  to  the  speed  up  of  the  simulations. 
The  total  utilisation  of  the  link  is  high,  at  a  value  of  around  0.85. 

Our  results  are  shown  in  Table  1  and  show  the  difference  between  using  pricing  and  no 
pricing.  What  can  be  seen  is  that  both  network  efficiency  and  economic  efficiency  increase 
at  the  same  time  by  using  pricing.  Cell  loss  drops  from  19%  to  under  2%,  while  the  net 
benefits  perceived  by  the  users  increase  by  nearly  15%. 

Table  1  Performance  and  Economic  Gains  from  User  Feedback. 


Source  Type 

%  Loss 

User  Value 

%  Dec.  Loss 

%  Inc.  Value 

Inelastic 

0 

240 

Unpriced 

Elastic 

30.4 

146 

Combined 

19.1 

386 

Inelastic 

4.4 

239 

Priced 

Elastic 

0.1 

204 

Combined 

1.7 

443 

91.0 

14.8 
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4  CONCERNS  ABOUT  USAGE-SENSITIVE  PRICING  IN  NETWORKS 

We  explore  some  common  arguments  against  usage-sensitive  pricing  in  network  oper¬ 
ations  in  this  Section,  and  provide  some  counter-arguments.  Some  previous  work  along 
these  lines  is  contained  in  MacKie-Mason  (1994). 

•  once  a  network  is  installed,  any  load-dependent  costs  of  transferring  data  are  min¬ 
imal  -  the  fixed  costs  of  network  management  and  maintenance  dominate.  These 
fixed  costs  can  be  efficiently  recovered  through  connection  fees  and  capacity  prices. 
Why  implement  an  elaborate  pricing  mechanism  to  recover  the  relatively  small  vari¬ 
able  costs  ? 

—  Counterpoint  :  this  ignores  the  congestion  cost  which  one  user’s  traffic  imposes 
on  other  users  sharing  the  resources.  Bandwidth  or  buffer  space  occupied  by 
one  user’s  traffic  is  not  available  to  other  users.  When  this  reduces  other  users’ 
quality  of  service  (  through  increased  delays,  loss  rates,  blocking  probabilities, 
and  so  on  ),  they  sufTer  congestion  costs  which  may  translate  into  significant 
actual  costs  of  service  degradation.  One  mechanism  to  capture  these  costs  is  a 
price  which  is  sensitive  to  some  indicator  of  congestion,  such  as  load. 

•  even  if  we  want  to  consider  congestion  costs,  how  can  the  network  determine  what 
actual  costs  the  current  load  is  imposing  on  users  who  probably  have  widely  varying 
service  requirements  ?  Getting  users  to  reveal  these  costs  is  likely  to  be  extremely 
complicated,  if  not  impossible. 

—  Counterpoint  :  it  is  true  that  providing  users  with  the  right  incentives  to  reveal 
their  actual  costs  of  service  degradation  is  complicated.  However,  with  any 
prices  that  increase  with  the  degree  of  congestion  in  the  network,  users  will  be 
induced  to  prioritize  their  traffic.  Only  users  who  value  their  traffic  at  least  as 
much  as  the  current  price  will  transmit.  If  congestion  remains  unacceptably 
high,  then  the  price  was  too  low;  conversely  if  capacity  is  unacceptably  under¬ 
utilized,  the  price  was  too  high.  Thus,  through  a  process  of  experimentation 
and  dynamic  adjustment,  the  network  can  shape  the  price  schedule  so  that 
users  approximately  reveal  their  valuations  for  uncongested  service  through 
their  responses  to  the  prices. 

•  why  won’t  some  non-pricing  scheme  be  enough  ?  Administrative  controls  can  be 
used  to  impose  some  appropriate  notion  of  fairness,  for  example;  or  users  can  choose 
a  traffic  priority  level  which  matches  their  requirements. 

—  Counterpoint  :  who  decides  what  is  fair  ?  The  network  operator  can;  but 
according  to  a  user-oriented  objective,  fairness  should  be  determined  collec¬ 
tively  by  the  users.  We  might  all  agree  that  telesurgery  is  more  important 
than  email,  but  what  about  interactive  video  games  versus  email  ?  Also,  ev¬ 
ery  time  a  new  application  is  developed  it  has  to  be  slotted  into  the  priority 
order,  an  increasingly  complex  process.  Suppose  the  network  simply  supports 
priority  levels  and  allows  each  user  to  choose  their  own  level.  Why  wouldn’t 
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they  all  choose  the  highest  priority  ?  To  guard  against  such  abuses,  there 
would  have  to  be  some  penalty  for  “inappropriate”  declarations,  implying  the 
need  to  define  “appropriate”  priority  levels  or  to  assign  increasing  charges  to 
higher  priorities,  e.g.  Bohn  (1993).  A  user’s  choice  of  priority  level  would  then 
be  based  on  economic  considerations  :  balancing  the  benefits  of  higher  priority 
against  the  costs  and/or  the  penalties  for  inflating  their  application’s  perceived 
priority  level.  Pricing  represents  the  limiting  case  of  a  continuous  spectrum  of 
priorities. 

•  most  users  will  want  to  know  their  charges  in  advance,  and  will  not  want  to  deal 
with  prices  that  change  during  the  lifetime  of  a  typical  connection. 

—  Counterpoint  :  we  are  not  advocating  that  all  users  must  face  usage-sensitive 
prices.  Any  user  can  choose  not  to  face  dynamic  prices,  even  if  their  applica¬ 
tion  is  adaptive.  They  would  then  be  charged  according  to  some  other  pricing 
scheme,  which  should  be  co-ordinated  with  the  usage-sensitive  pricing  mech¬ 
anism.  Or  a  user  faced  with  dynamic  prices  can  choose  to  ignore  those  prices 
by  transmitting  at  their  application’s  natural  information  rate,  and  paying  the 
resulting  charges.  Finally,  in  our  scheme  -  and  in  any  realistic  pricing  scheme  - 
it  would  be  possible  for  a  user  to  set  the  maximum  charge  they  are  willing  to 
pay,  which  is  what  is  usually  required  for  budgetary  purposes. 

•  bits/bytes/cells  are  not  the  correct  units  to  charge  for  -  it’s  information  that  users 
care  about.  Any  scheme  which  proposes  to  look  inside  every  data  unit  to  determine 
how  it  relates  to  other  data  units  is  likely  to  be  too  complex  to  be  justified.  Also, 
lower-layer  mechanisms  (  such  as  Ethernet  collisions  )  or  cell  losses  requiring  higher- 
layer  retransmissions  make  it  difficult  to  predict  how  much  “raw”  data  has  to  be 
transmitted  to  transfer  a  given  amount  of  information.  Should  users  be  charged  for 
retransmissions  that  they  have  no  control  over,  or  cells  which  are  dropped  by  the 
network  ? 

-  Counterpoint  :  our  scheme  involves  pricing  for  transport,  not  for  content.  The 
“importance”  of  a  particular  cell,  and  its  relation  to  other  cells,  is  a  higher- 
layer  issue  determined  by  the  application  (  or  ultimately  by  the  users  ).  We 
are  not  proposing  that  the  network  be  aware  of  these  issues;  on  the  contrary, 
the  network  view  in  our  scheme  is  that  it’s  up  to  the  users  to  decide  how  cells 
are  used  to  transfer  information.  It  is  in  general  impossible  to  predict  exactly 
how  many  cells  are  required  to  transmit  a  block  of  information,  but  again  this 
is  a  higher-layer  issue.  The  basic  question  is  whether  the  users  or  the  network 
should  bear  the  uncertainty.  If  the  network  is  expected  to  offer  a  “file  transfer” 
service,  the  file  transfer  charge  per  megabyte  could  be  computed  by  averaging 
over  many  such  transfers.  If  the  user  is  expected  to  pay  for  all  transmitted  cells, 
their  application  could  for  example  define  a  maximum  number  of  cells  they  are 
willing  to  transmit  per  megabyte  of  information,  or  a  maximum  amount  they 
are  willing  to  pay  per  megabyte  of  information  actually  transferred. 
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dynamic  pricing  schemes  are  unworkable  in  practice  due  to  the  overheads  involved 
in  accounting  and  billing  for  usage  on  such  a  detailed  level.  In  addition,  a  significant 
portion  of  the  revenue  raised  is  needed  to  defray  the  cost  of  doing  dynamic  pricing 
in  the  first  place. 

—  Counterpoint  :  the  costs  of  dynamic  pricing  may  outweigh  the  benefits  for  a 
particular  implementation  but  we  do  not  believe  this  is  necessarily  true  for  all 
dynamic  pricing  schemes.  In  particular,  online  pricing  mechanisms  may  reduce 
the  actual  cost  to  an  acceptable  level;  there  is  no  reason  to  think  that  current 
billing  and  accounting  costs  in  other  industries,  such  as  telephone  or  electricity 
networks,  will  necessarily  apply  to  dynamic  pricing  in  ATM  networks.  This 
concern  can  only  be  answered  for  each  scheme  individually,  and  should  obvi¬ 
ously  be  part  of  the  overall  decision  on  what  usage-sensitive  pricing  scheme  to 
implement,  if  any. 

dynamic  pricing  is  impractical  because  users  cannot  respond  to  prices  which  are 
updated  many  times  per  second.  If  the  update  interval  is  increased  to  the  minimum 
period  in  which  users  can  respond,  congestion  can  arise  and  disperse  in  between 
price  updates,  so  that  prices  no  longer  influence  user  behaviour. 

—  Counterpoint  :  our  scheme  assumes  an  intelligent  network  interface  at  price- 
sensitive  user  sites,  so  the  processing  necessary  to  respond  to  dynamic  prices 
would  be  done  automatically  based  on  pre-programmed  user  preferences.  Cur¬ 
rent  ATM  connection  admission  control  schemes  already  assume  enough  user 
intelligence  to  be  able  to  negotiate  quality  of  service  parameters,  so  our  scheme 
adds  a  little  more  complexity  rather  than  a  new  requirement.  This  software 
would  play  a  similar  role  to  current  TCP  implementations,  which  respond  to 
network  feedback  by  adjusting  their  traffic  inputs,  except  that  the  feedback  in 
our  case  is  the  current  price. 

charging  for  cells  transmitted  fails  to  capture  cases  where  the  benefit  of  a  transfer 
is  with  the  receiver.  If  senders  are  charged  for  receiver-initiated  transfers,  we  could 
see  a  drastic  reduction  in  the  number  of  open-access  servers  with  a  corresponding 
decrease  in  the  value  of  using  the  network. 

—  Counterpoint  :  we  do  not  believe  that  associating  the  charge  for  a  transmis¬ 
sion  with  the  sender  constrains  the  actual  flow  of  money  in  any  way.  ft  is 
easy  to  imagine  multiparty  connection  protocols  which  initially  negotiate  each 
party’s  responsibility  for  the  total  charge,  or  “reverse-charges”  servers  which 
only  transmit  data  once  the  receiver  has  indicated  willingness  to  pay  the  re¬ 
sulting  transmission  costs. 

With  any  form  of  usage-sensitive  pricing,  it’s  the  small  users  who  will  suffer  the 
most.  Rich  users  could  behave  as  they  want  since  they  have  the  resources,  and 
could  effectively  limit  the  network  access  of  smaller  users. 

—  Counterpoint :  with  our  scheme,  if  your  application  is  flexible  enough,  your 
charges  would  be  zero.  Many  opponents  of  usage-sensitive  pricing  seem  to 
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believe  that  it  inherently  involves  charging  for  every  cell/bit/etc.  However  our 
scheme  explicitly  recognises  that  if  there  is  no  congestion  in  the  network,  the 
usage-sensitive  price  should  be  zero.  Users  who  are  flexible  enough  to  wait 
for  such  periods  can  then  transmit  for  free4.  It  would  be  a  relatively  simple 
matter  to  set  a  maximum  price  per  cell  of  zero  and  let  your  network  interface 
determine  when  to  transmit  -  assuming  your  application  can  wait  for  the  price 
to  drop.  As  for  rich  users  being  able  to  afford  to  ignore  dynamic  prices,  this  is 
true  under  any  pricing  scheme  and  is  not  particular  to  dynamic  pricing.  We  do 
not  mean  to  dismiss  income  distribution  problems  as  unimportant,  but  merely 
to  say  that  network  pricing  (  or  non-pricing  )  is  not  the  right  venue  for  solving 
them. 

•  usage-sensitive  pricing  is  just  another  way  for  network  operators  to  make  more 
money.  Users  will  lose  out  as  network  operators  maximise  their  profits. 

—  Counterpoint  :  it’s  true  that  there  is  the  potential  for  profiteering  whenever 
prices  are  charged,  especially  when  the  conditions  under  which  prices  are  set 
are  not  immediately  accessible  to  ordinary  users.  But  in  a  competitive  envi¬ 
ronment,  network  operators  have  market  incentives  to  keep  their  margins  of 
revenue  over  actual  cost  as  low  as  possible.  This  incentive  is  missing  in  the 
case  of  a  monopoly  provider  or  a  cartel  of  price-fixing  providers.  But  whether 
abuse  is  possible  in  this  case  depends  on  policy  and  regulatory  decisions  rather 
than  on  the  specific  pricing  scheme. 

•  economics  is  important  in  network  planning  but  has  nothing  to  do  with  the  technical 
operation  of  a  network,  whether  public  or  private. 

—  Counterpoint  :  economics  has  a  lot  to  do  with  network  operation  !  Packet¬ 
switching  was  developed  for  computer  communications  because  around  1970 
it  became  more  economical  to  use  switching  and  routing  to  statistically  mul¬ 
tiplex  several  connections  into  one  transmission  link,  rather  than  dedicating 
one  circuit  to  each  connection  as  in  circuit-switching.  Economics  plays  a  role 
in  formulating  and  solving  decision  problems  in  all  types  of  network;  current 
price  schemes  differ  from  ours  in  the  frequency  with  which  prices  are  updated. 
Our  scheme  simply  moves  this  updating  into  “real-time”  for  those  users  who 
are  able  and  willing  to  respond  on  that  timescale. 


5  CONCLUSIONS 

We  have  presented  an  economic  framework  for  adaptive  users  in  ATM  networks.  Instead 
of  the  typical  requirement  for  traffic  descriptors  in  order  to  get  performance  guarantees, 
these  flexible  users  can  get  loss  guarantees  if  they  adjust  their  traffic  input  rates  in  re¬ 
sponse  to  dynamic  feedback  from  the  network.  This  is  the  basis  for  recent  proposals  for 


4If  the  price  is  never  zero,  the  network  is  always  congested  and  capacity  expansion  is  indicated. 
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ABR  service  in  ATM.  Our  framework  takes  these  proposals  one  step  further  by  explicitly 
defining  how  that  feedback  is  generated  by  the  network,  and  what  form  it  takes.  In  our 
scheme  the  network  associates  a  cost  measure  with  the  utilisation  of  network  resources, 
announces  a  price  which  is  based  on  the  current  cost,  and  price-sensitive  users  adjust 
their  cell  inputs  based  on  this  price  and  their  own  specification  of  how  valuable  network 
service  is  to  them.  While  we  address  only  the  reactive  control  of  adaptive  users,  our 
scheme  could  be  part  of  a  more  comprehensive  billing  and  accounting  scheme  to  charge 
all  users  for  network  services. 

What  we  propose  is  to  give  users  incentives  to  consider  the  effects  of  their  usage  on 
other  users.  In  a  public  network,  where  the  users  cannot  be  assumed  to  be  cooperative, 
more  traditional  feedback  schemes  are  not  robust  to  user  manipulation  :  it  is  relatively 
easy  to  program  a  host  to  ignore  the  feedback  signals.  Of  course  it  would  be  just  as  easy 
to  ignore  price  signals;  but  since  users  would  be  liable  for  charges  they  incurred,  there  is 
some  incentive  to  respond. 

We  also  address  the  problem  of  user  service  valuations,  and  allow  for  adaptive  sources  to 
have  more  demanding  traffic  than  well-described  sources.  We  have  proposed  a  distributed 
iterative  pricing  algorithm  and  shown  (  by  simulation  )  that  it  is  possible  to  gain  both 
network  efficiency  and  economic  efficiency  by  using  pricing.  In  other  words,  the  network 
actually  carries  more  traffic  and  carries  more  important  traffic  from  the  users’  point  of 
view. 
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Abstract 

On  a  discrete  time  Markovian  model  of  a  two-stage  interconnection  network  we  develop  methods 
to  find  stochastic  bounds  on  the  number  of  lost  cells  at  the  second  stage.  We  deduce  bounds  on 
the  lost  rate.  These  methods  are  based  on  comparison  results  of  stochastic  processes,  Veinott’s 
criterion  and  lumpability  of  transition  matrices.  We  study  the  output  process  of  a  discrete  time 
Geomr  f  D  j  1  queue  at  the  first  stage  and  we  compute  geometrical  bounds  on  the  output  process 
of  this  Geomx / D /\  queue. 


Keywords 

Multi-Stage  Interconnection  Network,  Discrete  time  Markovian  models,  Strong  Ordering,  Veinott’s 
Criterion,  Lumpability. 


1  INTRODUCTION 

Most  of  the  telecommunications  systems  are  composed  of  interconnected  nodes.  The  messages 
routed  from  a  source  to  a  destination  through  the  interconnection  network  of  nodes  are  generally 
cut  into  numbered  frames  also  called  packets  or  cells  (in  the  context  of  the  Asynchronous  Transfert 
Mode  (ATM)).  Each  node  is  a  system  which  switches  packets  of  messages  from  its  input  to  the 
desired  output.  Roughly  speaking  a  node  is  a  black  box  composed  of  a  commutation  function 
(insures  the  routing  of  a  packet  from  the  input  to  the  output  of  the  node)  and  a  capacity  to  store 
packets  waiting  the  commutation.  If  a  packet  arrives  at  a  full  node  it  is  lost.  One  of  the  most 
important  problem  is  to  determine  the  capacity  of  a  node  such  that  very  few  packets  will  be  lost. 
This  is  called  the  dimensioning  problem.  To  address  this  problem  we  have  to  compute  the  lost 
rate  (i.e,  the  ratio  of  the  mean  number  of  packets  lost  by  the  mean  number  of  packets  arriving  at 
the  considered  node). 
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The  general  context  of  our  analysis  is  the  one  recommanded  by  the  CCITT  which  is  ATM. 
For  this  transfert  mode  the  packets  have  a  fixed  length  (53  bytes)  to  insure  a  quick  commutation 
in  a  node.  Due  to  the  fixed  lenght  of  the  cells  and  the  fixed  commutation  duration  the  time  is 
discretized  into  unity  called  slot.  Knowing  the  fact  that  the  behavior  of  a  node  depends  of  the 
behaviors  of  the  other  nodes  connected  to  its  inputs  and  also  depends  of  the  behaviors  of  the 
sources  we  first  want  to  study  only  the  behavior  of  a  node  submitted  to  an  input  traffic.  This 
traffic  is  modelized  by  a  stochastic  process.  This  process  represents  the  resulting  behaviors  of  the 
sources  and  the  other  nodes  of  the  network. 

Performance  of  a  node  or  switch  depends  of  its  input  traffic  and  its  architecture.  One  of  the 
most  studied  architecture  for  a  node  is  the  Multi-stages  Interconnection  Network  (MIN).  It  seems 
that  the  choice  of  such  architecture  is  made  both  by  the  industrials  and  the  researchers.  This  is 
due  to  the  fact  that  the  realization  of  MIN  has  a  low  cost  and  performance  measures  interesting. 
Such  a  switch  is  composed  of  switching  elements  with  very  few  inputs  and  outputs  (to  insure  a 
very  fast  commutation).  These  switching  elements  have  a  commutation  function  and  capacity  to 
store  cells  waiting  to  be  switched.  The  switching  elements  are  connected  by  an  interconnection 
network. 

The  final  aim  is  to  proposed  numerical  methods  with  low  complexity  to  address  the  dimension¬ 
ing  problem  for  MINs.  This  will  imply  that  these  methods  allow  a  designer  of  switches  to  choose 
the  best  possible  configuration  for  a  switch  submitted  to  several  kinds  of  input  traffic  (i.e,  Bernoulli 
traffic,  Bursty  Geometric  or  ON/OFF  traffic,  and  so  on).  In  general  a  MIN  is  represented  by  a 
feed-forward  queuing  network  and  choosing  a  configuration  means  to  choose  the  number  of  queues 
and  their  capacities  and  also  the  routing  network  between  the  queues.  Let  us  also  note  that  it  is 
very  important  to  be  able  to  observe  the  behavior  of  a  switch  submitted  to  several  kinds  of  traffic 
with  different  variance  for  the  same  mean  number  of  arrivals  because  the  lost  rate  depends  of  this 
variance. 

In  this  paper  we  are  interested  in  the  dimensioning  problem  of  MINs.  The  performance  criterion 
considered  is  the  loss  rate.  This  is  a  classical  problem  in  telecommunication  networks  but  the 
imposed  lost  rate  of  ,10~9  in  high  speed  networks  as  ATM  networks  is  a  new  difficulty  in  the 
performance  evaluation  of  these  kind  of  networks  compared  with  the  dimensioning  problem  of 
telephone  networks  with  an  imposed  rate  of  rejected  (or  lost)  calls  of  about  10~2.  Discrete  time 
Markovian  models  of  MINs  have  state  space  with  a  very  high  number  of  states  in  practice  and  these 
kind  of  networks  have  no  analytical  solution,  no  product  form  solution  for  instance.  Because  of  the 
number  of  states  in  these  models  the  algorithmic  complexity  does  not  allow  the  computation  of 
the  exact  solution  for  lost  rate  in  particular.  The  use  of  approximation  techniques  to  reduce  states 
space  is  a  natural  way  to  address  the  dimensioning  problem  but  pure  approximation  techniques 
must  be  validated  using  rare  or  very  rare  event  simulation  techniques  which  are  generally  very 
expensive  techniques.  So  what  we  propose  in  this  paper  is  to  develop  special  approximation 
techniques  to  obtain  bounds  wich  permit  us  to  avoid  validations  by  simulation.  These  bounds 
must  be  easy  to  compute  (i.e.,  the  complexity  must  be  much  lower  than  the  complexity  of  the 
computation  of  the  exact  solution).  The  aim  of  this  work  is  to  present  the  basic  methods  and  the 
basic  mathematical  results  to  address  the  dimensioning  problem  of  a  very  simple  model  of  MIN. 
This  model  is  a  Markovian  model  of  a  two-stages  interconnection  network  with  Bernoulli  input 
traffic.  Of  course  this  model  is  not  realistic  when  modeling  ATM  switch.  This  is  due  to  the  fact 
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that  the  Bernoulli  traffic  is  not  realistic  but  we  focus  our  attention  on  the  methodologies  and  we 
will  give  some  tracks  how  to  modify  the  basic  methods  for  more  general  input  traffics. 

The  starting  point  to  obtain  bounds  is  to  use  some  structural  properties  of  the  model.  The 
basic  property  we  use  in  this  work  is  that  the  number  of  lost  cells  at  each  slot  t  in  a  queue  of 
a  switching  element  is  an  increasing  function  of  its  input  processes.  Now  what  we  want  is  to 
bound  the  output  process  of  a  queue  by  a  very  simple  renewal  process.  That  is  why  we  propose 
methods  for  finding  geometrical  bounds  on  the  output  stream  of  a  queue  using  comparison  results 
on  random  variables  and  processes.  We  use  the  strong  comparison  [Sto76]  because  it  is  generated 
by  increasing  functions  and  the  number  of  lost  cells  is  an  increasing  function  of  the  input  processes. 
The  second  idea  is  to  reduce  the  state  space  (note  that  this  is  generally  done  by  other  techniques) 
that  is  why  we  investigate  results  on  lumpability  [JJ60],  [R.S91]. 

The  paper  is  organized  as  follows. The  section  2  deals  with  the  mathematical  results  useful  for 
obtaining  bounds  on  the  loss  rate.  The  main  result  concerns  the  comparison  of  Markov  processes. 
The  section  3  is  devoted  to  the  model  description  of  the  interconnection  network.  Roughly 
speaking  this  system  could  be  considered  as  the  first  two  stages  of  a  Clos  interconnection  network 
[Clo53].  In  section  4  we  present  methods  based  on  lumpability  results  and  in  section  5  we 
present  a  method  based  on  Veinott’s  Criterion  to  obtain  an  upper  bound  on  the  output  stream  of 
Geomx / D/l/ FC FS/C  queue  with  service  duration  equal  to  1.  The  section  6  is  devoted  to  the 
study  of  numerical  examples  of  the  bounding  methodologies  and  a  discussion  on  these  numerical 
results.  Then  we  conclude  in  the  last  section. 


2  MATHEMATICAL  RESULTS 

In  this  section  we  present  the  main  central  result  (see  proposition  2.3)  concerning  the  comparison 
of  discrete  time  Markov  processes  in  the  sense  of  the  strong  ordering.  But  before  that  we  need  to 
introduce  some  definitions  and  properties. 

The  strong  ordering  is  the  basic  notion  of  this  paper.  This  ordering  is  generated  by  non¬ 
decreasing  functions. 

Definition  2.1  (Strong  Ordering)  Let  k  >  0  be  an  integer.  Let  X  and  Y  be  two  IRk -valued 
random  variables.  We  say  that  X  is  lower  than  Y  in  the  sense  of  the  strong  ordering  iff  : 
for  all  functions  f  :  IRk  — -t  IR  nondecreasing  in  the  sense  of  the  componentwise  ordering  on  IRk 
the  inequality 

Ef(X)  <  Ef(Y) 

holds,  provided  that  the  expectations  exist. 

The  strong  ordering  has  many  properties  and  the  reader  is  referred  to  [Sto76]  for  more  details 
on  this  subject  but  here  we  need  to  mention  the  following  property  which  will  be  used  to  simplify 
the  study. 
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Proposition  2.1  (Comparing  independent  variables)  Let  us  consider  consider  two  vectors 
of  independent  variables  ,  respectively  denoted  by  X  =  (Xi,...,Xn)  and  \  =  (ii We 
then  have  the  following  result  : 


X  <  Y  iff  Vi  Xi  <st  Yi 

Let  us  notice  that  the  previous  proposition  could  be  generalized  to  the  case  where  Xi  (resp.  Yi) 
is  a  random  vector,  for  all  i  =  1, ... ,  n. 

To  compare  a  vector  of  random  variables  in  the  sense  of  <st,  we  will  use  a  sufficient  condition 
called  Veinott’s  Criterion  (1965).  We  borrow  the  definition  from  [Sto76]  suited  to  our  case. 

Proposition  2.2  (Veinott’s  Criterion)  Let  U  —  (U0, . . . ,  Ut)  and  V  =  (V&, . . . ,  Vt)  be  two  vec¬ 
tors  of  random  variables  talcing  their  values  in  £. 

If 

U0  <st  Vo  (la) 

and 


Vj  =  1,. . .  ,t  V(a c,u,v)  E  {0, . .  .Mj  x  {0, . . .  M}3  x  {0, ..  .M}3,  u<v 
Pr{Uj  >  x\Uo  —  Uo,  ■  ■  ■ ,  Uj-i  =Uj-i)  <  Pr{VJ>x\V0  =  v0,...,VJ.1  =  Vj-t)  (lb) 
noting  that  u  =  (uq,  •  •  • ,  «j_i)  and  v  =  (vo,  ■  ■  ■ ,  Vj-i) 

then  U  <st  V 

The  above  criterion  is  very  useful  because  it  is  taking  into  account  the  fact  that  the  random  vectors 
may  have  correlated  components  (which  is  the  case  when  studying  Markovian  processes). 

Because  we  are  studying  processes  with  state  space  S  =  {0, . . . ,  A/},  we  give  a  specified  version 
of  definition  2.1  in  this  particular  case.  Noting  that  the  set  of  nondecreasing  functions  on  £  = 
{0, . . .  M}  is  generated  by  the  functions  1  {x>k},k  =  0, . . . ,  M  we  can  express  the  <st  —comparison 
as  follows. 

Let  be  X  and  Y  two  random  variables  taking  their  values  in  {0, . . .  A/}.  Let  p  (resp.  q)  be 
the  row  vector  of  distribution  of  X  (resp.  Y).  We  say  that  X  is  lower  than  Y  in  the  sense  of  <st 
ordering  and  we  denote  X  <st  Y  (or  equivalently  p  <st  q)  iff 

V  Kst  <  q  I<st  componentwise  (2) 

with  K,t  the  (Af  +  1)  X  (Af  +  1)  following  matrix  : 

/  1  0  0  ...  0  \ 

1  1  0  ...  0 


Villi  1) 


(3) 
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For  instance  if  M  =  2  then 


Is  st  — 

If  p  =  (0.1, 0.3, 0.6)  and  q  =  (0.7, 0.2, 0.1), 
(1,0.3,  0.1)  we  see  that  p  >st  q. 


(  1  0  0  \ 

1  1  0 
V  i  i  i  / 

then  computing  p  I\st  =  (1,0.9, 0.6)  and  q  I\st  = 


With  the  above  mathematical  tools  we  are  now  able  to  write  the  central  result  on  comparison 
of  Markovian  processes  in  the  following  proposition. 

Proposition  2.3  (Comparing  Markov  Processes)  Let  {Xt,t  >  0}  and  {Yt,t  >  0}  be  two 

Markovian  processes  taking  their  values  in  £ .  These  processes  are  also  denoted  by  {p0,  P)  and 
(qo-iQ),  respectively.  The  row  vector  p0  (resp.  qo)  denotes  the  distribution  vector  of  Xo  (resp.  Yq), 
P  (resp.  Q)  denotes  the  transition  matrix  of  {Xt,t  >  0}  (resp.{Yt,t  >  0}j.  If 


Po  <si  qo 


and 


P  Kst  <  Q  I\st  term  by  term  comparison 


and  A  —  P  or  A  =  Q  is  monotone  ie  : 


Vz  ^  j  Ah.  Ajt. 


(4a) 

(4b) 

(4c) 


where  Ah.  is  the  Ith  row  of  matrix  A, 
then 

Vt  Xt  <st  Yt 


moreover 

Vt  (Xt) . . . ,  Xo)  <st  {Yt, . .  ■ ,  Y0) 


-Proof  : 

For  the  point  by  point  comparison  see  [Kei77].  For  the  vectorial  comparison  we  just  have  to  note 
that  Veinott’s  Criterion  for  Markovian  processes  is  equivalent  to  (4b).  This  vectorial  comparison 
could  also  be  obtained  using  coupling  argument  (see  Doisy  [Doi92]  )  □ 

3  MODEL  DESCRIPTION 

In  this  section  we  present  a  simple  model  of  a  switch  architecture  submitted  to  Bernoulli  input 
traffic.  This  could  be  represented  a  very  simple  and  not  very  realistic  model  (due  to  the  assumption 
concerning  the  input  traffic)  ATM  switch. 

The  model  of  the  ATM  switch  studied  here  is  the  same  as  that  described  in  [BM93],  [Bey93] 
where  most  of  assumptions  made  are  the  same  as  in  [KHM87].  After  describing  it  we  present 
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the  evolution  equations  and  the  fundamental  properties  which  induce  the  methods  proposed  here 
for  bounding  the  loss  rate.  In  the  last  point  we  present  fundamental  assumptions  for  using  our 
bounding  methodologies. 

3.1  The  model 

Consider  a  two-stage  interconnection  switch  with  N  entries  and  M  outputs  (see  Fig.  2).  We 
assume  that  this  system  operates  in  a  discrete  way  at  each  time  slot  and  we  make  our  analyze  at 
the  cell  level. 

We  make  the  hypothesis  that  departures  occur  before  arrivals  of  cells. 

The  arrival  processes  at  each  entry  are  geometrically  distributed  with  parameter  p.  Arrival 
processes  at  two  different  inputs  are  assumed  to  be  independent.  Let  us  notice  that  in  the  ATM 
context  this  assumption  does  not  allow  to  treat  the  problem  of  the  bursty  traffics.  This  will  be 
discussed  at  the  end  of  this  paper. 


Figure  T.  Switching  Element  of  stage  k 

We  assume  that  each  stage  k,  k  £  {1,2},  is  composed  of  Ek  non-blocking  identical  switching 
element.  A  switching  element  at  stage  k,  k  £  {1,2}  (see  Fig.  1)  has  ak  inputs  and  Sk  outputs. 
Each  output  is  a  queue  with  service  duration  1  (slot),  service  discipline  FCFS  and  a  finite  capacity 
Mk.  This  queue  is  denoted  by  Gx  / D/1/ FC FS/ Mk  and  its  service  time  is  equal  to  1.  Let  us  notice 
that  for  k  =  1,  the  arrival  process  at  a  queue  of  the  first  stage  is  a  bulk  geometrical  process  and 
the  size  of  the  batch  arrival  is  less  or  equal  to  a i  (recalling  that  ai  is  the  number  of  inputs  of  a 
switching  element  at  stage  1). 

Let  us  denote  ( i,j,k )  (resp.  (o,j,k))  the  input  i  (resp.  output  o)  of  the  switching  element  j 
at  stage  fc,  k  £  {1,2}. 

The  commutation  function  of  a  switching  element  connects  all  input  ports  to  all  output  ports. 
This  function  is  supposed  to  be 

uniform,  i.e  the  probability  that  a  cell  arriving  at  input  ( i,j,k )  being  switched  to  output  (o,j,k) 
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sta8e  *  stage  2 


Figure  2:  A  Two-stage  Interconnection  Network 
is  the  same  for  all  o  and  then  is  equal  to 

sk 

independent  of  the  state  of  the  switch  (i.e,  the  number  of  cells  in  each  queue  of  the  switch). 

The  interconnection  network  is  modelized  by  the  following  bijection 

{1 . at-!}  x{l . £m}  x  {1}  — >  {1,. ..,«*}  x{l,. ..,£*}  x  {2} 

'  (o,j;k)  ' — >  1)  =  {j,o,k  +  1) 

whose  inverse  is 

C~x  :  (i,j,k)  i — >  ( o,j,k  -  1)  =  ( j,i,k  -  1) 

As  an  immediate  consequence  on  the  design  of  such  a  MIN  we  have  : 

ai  =  E2  (5) 

3.2  Evolution  equations 

Let  us  denote  by  N^{t),  the  random  variable  number  of  cells  of  queue  o  of  switching  element  j 
at  stage  k. 

Because  of  the  previous  assumptions  made  the  stochastic  process  {S{t),t  €  IN}  is  Markovian, 
where  IN  is  the  set  of  integers  and  : 

S(t)  =  (N*j(t))  {k=l,2-,j=l...Ek-,o=l...sk} 

The  evolution  equations  of  such  system  are  : 
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Vo,  j,  k 
'+  =  max(0,  x) 


Nk0,3{t)  =0 

KM  +  !)  =  min (Mk,  (N^(t)  -  1)+  +  Ak-{t))  t  >  0 


(6) 


where  (x)+  =  max(0,x),  Ak  At )  is  the  number  of  cells  arriving  to  the  output  o  of  j  at  stage  k 
between  ]t,t  +  1],  Ak  At)  is  defined  as  follows  : 


® k 

=  (7) 

i=  1 

where  (  |  )  denotes  the  canonical  scalar  product  in  IRn.  IkJl{t)  =  l{jv*_1(()>o}>  by  definition  of  the 

connection  between  stages.  Note  that  if  k  =  1  the  variables  /^(f)  are  iid  Bernoulli  distributed 
with  parameter  p. 

The  random  variables  ukj(t)  represent  the  commutation  function  of  a  switching  element  and 
take  their  values  ei, . . . ,  eD, . . . ,  eSk  where  e„,  o  =  1, . . . ,  s*  is  a  row  vector  with  components  all 
equal  to  0  except  the  oth  which  is  equal  to  1.  The  event  uk ft)  =  eQ  (or  equivalently  the  event 
=  i )  means  that  cell  arriving  at  i  chose  the  output  o  of  the  switching  element  j  at 

stage  k. 


3.3  Fundamental  properties 

The  following  propositions  give  the  fundamental  properties  of  the  model  used  in  finding  bounds 
on  the  lost  rate  :  at  each  slot  t  the  number  of  cells  in  a  queue  and  the  number  of  lost  cells  at  the 
same  queue  are  increasing  functions  of  the  input  processes. 

Proposition  3.1  (The  number  of  cells  is  a  nondecreasing  function  of  the  inputs)  For  all 

k  >  0,  for  all  t  >  0  we  have  the  following  result. 

For  all  o,j,  Nk-(t)  is  a  nondecreasing  function  of((IkJx(t- 1), . . . ,  Ik~1(  0));  (ckj(t), ckf  0))) 
where  : 

W  Ik~\t')  =  (ik-\t%=1,.„ak 

and 

W  c^t')  =  ((e0\uk-(t')))i= 

-Proof  : 

Using  relations  (6)  and  (7),  the  result  is  proved  by  induction  on  t.  □ 

Moreover  if  n£ j(t)  denotes  the  number  of  lost  cells  at  queue  o  of  switching  element  j  at  stage  k 
and  at  instant  t,  then  we  have  the  following  result  : 

Proposition  3.2  (The  number  of  lost  cells  is  a  nondecreasing  function  of  inputs)  For  all 

k  >  0,  for  all  t  >  0  we  have  the  following  result. 
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For  all  o,j,  Il£  j(t)  is  a  nondecreasing  function  of  (( I kj  1(t), . . . ,  Ikj  1(0));  ( Cj(t ), . . . ,  c*3(0))) 
where  : 


and 


Vf'  I'-'it')  = 

W  <*Jt)  =  ((Co| <(<'))).=  !, ...at 


-Proof  : 

We  just  have  to  note  that  the  evolution  equations  of  the  number  of  lost  cells  are 

J  nU°)  =  o  f8) 

\  n^.(f)  =  ((N^t)  -  1)+  +  Ak0ij(t)  -  M,)+  W 

and  then  use  the  previous  result  of  proposition  3.2.  □ 

Finally  we  have  to  mention  the  following  remark  : 

By  assumption  the  random  variables  {f,°j(f)}i<t<a1;i<j<.E1  are  independent  for  all  t  >  0.  Then  due 
to  the  interconnection  network  between  stage  1  and  stage  2  the  random  variables  {Ilj{t)}i<i<a2 
are  also  independent. 

This  last  remark  means  that  if  the  entries  of  a  switch  are  independent  then  the  input  ports 
of  a  switching  element  of  the  second  stage  are  also  independent.  This  is  another  restrictive  case 
for  applying  our  bounding  method  of  a  switch  model.  But  this  means  for  instance  that  a  Delta 

network  with  independent  inputs  have  the  same  property  at  each  stage  :  the  inputs  of  each 

switching  element  are  independent. 


3.4  The  dimensioning  problem 

Define  the  lost  rate  at  stage  k,  k  €  {1,2}  by 


n(k)  = 


lim 


EKAt) 


*->+ oo  EAk0  j(t) 


(9) 


then,  the  dimensioning  problem  of  the  deigner  of  switch  architecture  can  be  characterized  by  the 
following  iterative  algorithm  using  an  exact  or  a  bounding  methodology  : 


DimSwitch 


step  1  :  Define  the  Switch  (£u,ai,.si, 02,32) 
step  2  :  Put  capacity  of  queues  M/t,  k  =  1,2 

step  3  :  Compute  exact  (if  possible)  or  lower  and  upper  bounds  on  n(k),  k  =  1,2 
step  4  :  If  7r(l)  and  7r(2)  are  acceptable  then  Goto  step  5 
Else,  goto  step  2 

step  5  :  If  designer  decide  it  is  OK  Stop  Else  Goto  step  1 
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This  procedure  means  that  the  designer  has  to  fix  first  an  architecture  (step  1).  Then  he  has  to 
propose  some  capacities  for  the  queues  (step  2).  For  the  whole  configuration  one  has  to  estimate 
(or  bound)  the  loss  rate  (step  3).  If  the  loss  rate  is  acceptable  (i.e,  less  or  equal  to  a  given  value) 
then  the  designer  can  decide  to  stop  the  procedure  or  to  explore  new  architecture  (i.e,  restart  a 
whole  procedure).  Let  us  remark  that  the  decision  to  stop  the  design  process  is  generally  due  to 
the  calculus  of  an  economical  cost  of  the  implementation  of  designed  switch. 

3.5  Difficulty  of  the  problem 

Because  of  assumptions  on  input  processes  and  switching  function,  it  has  been  shown  (see  [Bey93], 
[])  that  the  behaviors  of  the  queues  of  the  first  stage  are  identical.  These  behaviors  are  represented 
by  the  processes  {fV^  ■(<),  t  €  IN}  which  are  identical  Markovian  processes  whose  transition  matrix 
denoted  by  T)  is  defined  as 


Ti  = 


(  Mp ) 
Mp) 
o 


o 


V  o 


for  all  o,j,  and  t  > 


l  Mp)  Mp)  Mp 

... 

ip)  +  ■  •  •  +  M(p)  \ 

Mp)  Mp)  Mp) 

bxu  (p)  +  •  •  ■  +  M  ip) 

o  Mp)  Mp)  ••• 

bMi 

Mp)  +  ■  ■  ■  +  M(p) 

if  ai  >  Mi 

K  0  0 

Mp)  Mp)  +  ■■■  +  M(p)  j 

blip)  Mp)  •••  Kip) 

0 

0 

blip)  Mp)  ■■■  M(p) 

0 

0 

Mp )  Mp)  Kip) 

0 

0 

o  Mp)  Mp) 

Mp) 

M(p) 

otherwise 

Mp) 

blip) 

M-i(p)  +  K(p) 

Mp ) 

blip) 

b2(p)  +  ■  ■  ■  +  bai(p) 

0 

Mp) 

blip )  +  ••  ■  +  K(p)  ) 

(10) 

0  where, 

brip) 

=  Pr(A0o>j(t)  =  i)  = 

■(r 

K' 

x  (i  - 

sl 

(11) 

is  the  probability  that  i  cells  arrive  at  queue  ( o,j ,  1). 

We  note  that  7r(l)  is  computable  (we  only  have  to  compute  the  steady-state  probability  vector 
of  Tj).  The  only  problem  still  remaining  is  the  exact  computation  of  7t(2).  To  compute  7r(2)  we 
only  have  to  study  a  queue  at  the  second  stage  and  the  upstream  queues  of  the  first  connected 
to  it.  This  problem  is  an  0{{{Ml  +  l)£l  x  (M2  +  l))3)  complexity  problem  because  this  kind  of 
networks  has  no  analytical  solution  (based  for  instance  on  a  product  form  result). 

So,  the  major  problem  we  have  is  a  complexity  problem.  It  means  that  for  a  “small”  switch 
with  a  few  number  of  queues  with  very  low  capacities  it  is  possible  to  obtain  the  exact  value  of 
7t(2).  But  this  implies  that  we  cannot  explore  a  lot  of  switch  configurations. 
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However  noticing  that  (see  [Bey93])  : 

EiKjW)  =  ^P(l  (12) 

we  only  have  to  focus  our  attention  on  the  computation  of  lim(_H.0O  E(Jl20 The  result  of 
proposition  3.2  suggests  the  following  fundamental  remarks  to  develop  bounding  methodologies  : 
if  the  input  processes  of  a  switching  element  at  stage  2  are  Bernoulli  then  7r(2)  is  easy  to  compute 
because  of  the  increasing  function  property  of  the  number  of  lost  cells  at  each  slot  t  we  want  to 
use  the  strong  ordering, 

because  of  the  variables  =  l{jvfc~‘(0>o}  Uet  us  recall  that  the  output  stream  of  a  queue 

(o,  j,  k)  is  the  stochastic  process  {l(Nt  (t)>o)G  £  /IV})  we  want  to  use  lumpability  results  [JJ60] 
because  the  input  stream  of  a  switching  element  at  stage  2  are  independent  and  the  result  of  the 
proposition  2.1  we  only  have  to  bound  (in  the  sense  of  <st)  the  output  stream  of  a  queue  of  stage 
1  by  Bernoulli  processes  (also  called  geometrical  processes). 

Finally  we  just  have  to  focus  our  attention  to  the  resolution  of  the  following  problem.  Let 
us  consider  an  homogeneous  and  irreducible  Markov  chain  { Xt,t  £  IN}  with  state  space  S  = 
{0, . . . ,  M}  which  will  be  denoted  from  now  by  (r0,  Tx)  (where  To  is  the  initial  condition  and  Tx 
the  transition  matrix)  and  a  process  {Yt,  t  £  IN}  such  that  for  all  t  >  0,  T'(+1  is  an  increasing 
function  of  the  vector  (1{x,>o}i  •  •  • ,  l{A'0>o})-  We  want  to  find  geometrical  lower  (resp.  upper) 
bounds  Ginf  (resp.  Gsup)  such  that  : 


V<  >  Q(Ginf(t), Ginf(0))  <st  (l{A't>o},  •  •  • ,  1{at0>o>  )  <st  (Gsup(t), . . .  ,Gsup(  0))  (13) 


4  BOUNDS  BY  AGGREGATION 


The  aim  of  this  section  is  to  find  lower  (resp.  upper)  Markovian  processes  X?ff  (resp.  Xfup) 
strongly  lumpable  according  to  the  partition  B  =  (B(0),  B(l))  with  B( 0)  =  {0}  and  B{  1)  = 
{1, . . . ,  M}  such  that  processes  Ginf  =  {l^i  (f)>o}>  1  >  0}  and  Gsup  =  {l{jfa»ip(i)>o})  t  >  0}  are 
geometrical  delayed  processes  (see  [Ros82])  and  satisfy  (13). 

First  we  give  the  definition  of  strong  lumpability  in  our  special  case. 

Definition  4.1  (Strong  Lumpability  [JJ60])  A  Markov  chain  ( t0,Tx )  is  strongly  lumpable 
according  to  B  iff 

Vfce  B(l)Tx(k,  B(0))  —  cste  =  Tx(0, 0)  (14) 

Theorem  4.1  (Main  Result)  ( t0,Tx )  is  a  Markovian  process  such  that  {l{xt>o}>^  >  0}  has  a 
geometrical  lower  (resp.  upper)  bound  of  parameter  pfnl  (resp.psJup)  with 


and 


S l 

r  sup 


M 

(15) 

M 

k?m£Tx{k’j) 

(16) 
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-Proof : 

Because  of  the  definitions  of  pfnj  and  psJup  we  note  that  Tf^Kst  <  T\Kst  <  TfupI\st  with 


/ 1 

- Pin 1 

pit 

0  • 

••  0  \ 

1 

-Pin) 

pit 

0  • 

••  0 

II 

"«s  . 

1 

—  nsl 
Pin  f 

Ptn! 

0  • 

••  0 

V 1 

—  n ?l 
Pin  f 

Pin  f 

0  • 

•  •  oj 

/ 1 

—  psl 
r  sup 

0  •• 

•  0 

Psup  \ 

1 

—  ps l 
r  sup 

0  •• 

•  0 

psl 
r  sup 

rpsl  _ 

1  sup 

1 

-  Psup 

0  •• 

•  0 

Psup 

V 1 

-  Psup 

0  ■■ 

•  0 

Psup  ) 

(17a) 


(17b) 


We  see  that  the  processes  (t0 ,T‘„j)  and  (r0,  Tfup)  are  Markovian  monotone  and  strongly  lumpable 
according  to  B.  So  we  can  apply  result  of  proposition  2.3. 

To  complete  the  proof,  note  that  a  {0,1}— valued  Markov  process  with  a  transition  matrix 
^  j  ^  ^  ^  is  a  delayed  geometrical  process.  □ 


5  BOUNDS  USING  VEINOTT’S  CRITERION 

Another  way  to  obtain  bounds  is  to  use  directly  Veinott’s  Criterion.  We  only  develop  the  method 
to  obtain  an  upper  bound  because  of  the  duality  of  the  problem.  Let  us  consider  a  Markov  chain 
(t0,  Tx )  with  state  space  £.  Our  aim  is  to  find  a  geometrical  delayed  upper  bound  Gsup  such  that 


Gsup(  0)  =  1  (18a) 

V<  >  1  Gsup(t)  iid  (18b) 

and 

V<(l{x,>o},---,l{A'0>o})  <st  {Gsup(t),...,Gsup{ 0))  (18c) 

Definition  5.1  (Possible  Sequence)  For  a  fixed  initial  condition  r0,  the  sequence  B(u0), . . . ,  B(ut) 
with  \H,u,  G  {0, 1}  and  \/iB(ui)  G  B  is  possible  iff 


PrT0{Xt  G  B(ut), . .  • ,  2fo  G  B(uo))  >  0 


(19) 


Theorem  5.1  The  parameter  pvfup  of  the  geometrical  upper  bound  is  such  that 


PTuV  =  min  SUP  PrTo(Xt  G  £(1)|X0  G  £(u0), . . . , G  B{ut-i)) 

T0  (>0,fl(«t) . fl(«o) 

where  B(u0 ), . . . ,  B(ut^i)  is  a  possible  sequence  for  the  initial  condition  r0. 


(20) 
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-Proof : 

For  a  fixed  initial  condition  r0  a  sufficient  condition  for  satisfying  (18a)-(18c)  for  all  t  >  0,  using 
Veinott’s  Criterion  (la-lb)  could  be  written  in  a  simple  manner  (because  Gsup(t)  are  iid),  that  is 

Vt  V(x,u)  G  {0,1}  x  {0,1}‘ 

PrT0(l{x<>o}  >  xlltx.-.xo}  =  i,...,l{A'0>o}  =  mo)  <  Pr(Gsup(t)  >  x)  (21) 

the  case  x  =  0  is  trivially  satisfied.  Then  the  result  is  obtained  by  definition  of  the  function  sup 
and  using  the  fact  that  we  want  the  smallest  possible  geometrical  bound.  □ 

Now  we  have  to  give  condition  of  existence  and  to  give  a  way  for  computing  this  bound.  If 
Tx  =  [Tx{i,j)](i}j)eB(i)xB(i),  then  we  have  the  following  result. 

Theorem  5.2  (Main  result)  If  Tf  =  Pa(*>  j)](;,j)gB(i)xB(i)  a  positive  matrix  such  that  the 

following  assumptions  are  true 

A\  :  Tf  has  no  null  row  vector  or  Tf  is  invertible 

a2  :  r+  is  diagonalizable  in  (U ,  the  set  of  complex  numbers,  or  irreducible 

then  the  parameter  exists  and  is  computable  using  the  following  algorithm  (where  r(Tf)  is  the 
maximum  eigenvalue  of  Tf ) 

Begin 


w  {Tx{0'  1}’  •  •  •  ’rx(0)  M)) 

max  =  max(r(Tj),  ||wTj||i)  (*  where:  ||(a:i, . . . ,  )  ||  i  =  £?=i  X;  *) 

While  (1)  do  (*  loop  *) 

w T+ 

max  =  ma  x(max,  \\“T$\\i) 

enddo  (*  end  loop  *) 


End. 

which  converges. 


-Proof  : 

The  fact  that  r(Tf)  exists  and  is  associated  to  a  vector  of  distribution  is  due  to  the  Perron- 
Froebenius-Gantmacher  [Gan64]  theory. 

Noticing  that  l{x,>o}  =  €  {0, 1}  has  exactly  the  same  meaning  than  X,  €  B(ut),u,  €  {0, 1}, 

we  use  result  in  Rubino  et  al  [RS91]  to  write  that  : 

PrT0{Xt  €  J3(l)|*o  e  B(uo),...,Xt-i  e  B(ut_i))  =  Prp(Xx  €  B(  1)) 
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with  (3  =  /(r0,  B(u0),  . . . ,  5(u<_  1)),  where  /  is  defined  for  all  possible  sequence  by 


I  /(r0,5(u0))  =t0B(uo> 

\  f(r0,B(u0),...,B(uk))  =(f(T0,B(u0),...,B{uk-l)Tx)B(uk) 


(22) 


where  ac ,  C  £  B  denotes  the  row  vector  with  Card(C)  components  defined  for  any  distribution 
vector  a  such  that  Y2kec  a(*)  /  0  by  : 


Vi  ac(i ) 


q(0 

T,kecaW 

0 


if  i  €  C 

otherwise 


(23) 


using  the  fact  that  /(to,  B(uo),  . . . ,  5(0),  5(tq), . . . ,  B(uk)) 
denotes  the  row  vector  with  all  its  components  equal  to  1, 
sequences  :  5(0)  =  5(0),  5(1), . . . ,  B[l), . . .  and  5(1)  = 
special  sequences  mean  that  in  fact  we  just  have  to  study 
when  at  instant  0  we  were  in  5(0)  or  already  in  -5(1). 


=  /(1B(0),  B(u,i), . . . ,  B(uk)),  where  1 
we  only  have  to  consider  two  infinite 
5(1),  5(1), . . . ,  5(1), . . ..  These  two 
the  time  spent  in  the  partition  5(1) 


Then  noticing  that 

/(r0,  5(0),  5(1), . . . ,  5(1))  =  /(1B<°>,  5(1), . . . ,  5(1)), 

/((0,  Vi),  5(1), . . . ,  5(1))  =  (0,wi),  with  Vi  the  probability  vector  associated  to  the  eigenvalue 

r(W, 

((0,  v)Tx  )B(1) 

il  —  Xin 

Wo  >  0  1 1  Wo  1 1 1  =  1 

Ti  is  such  that  lim^+oo  ||w(Ty||i  converges  to  r(Tx), 


vT+ 


IKT+II, 

and  the  series  defined  by 
the  result  is  obtained.  □ 


Wt+l 


_  UJt-l  x 

~  llu/tT+IU 


6  NUMERICAL  RESULTS 

In  this  section  we  give  numerical  results  concerning  the  dimensioning  problem  applied  to  a  two- 
stage  interconnection  network  to  obtain  a  loss  rate  less  or  equal  to  10-9. 

The  configuration  of  such  a  switch  is  completely  defined  defined  by  the  4-tuple  (5lt  ai,  si,  s2) 
(let  us  note  that  because  of  the  connection  in  the  switch  :  a2  =  £?i). 

We  have  chosen  a  fixed  value  for  the  parameter  of  the  input  Bernoulli  processes  which  is 
p  =  0.8.  This  means  that  for  any  input  port  at  each  slot  the  probability  that  a  cell  arrives  is 
0.8.  This  value  is  chosen  because  it  corresponds  to  a  quite  heavy  traffic  on  the  input  ports  of  the 
switch. 

To  obtain  the  delayed  geometrical  output  processes  bounding  the  output  stream  of  a  queue  at 
the  first  stage  of  the  switch,  we  apply  results  on  section  4  and  section  5  to  the  Markov  process 
denoted  by  (.,  T J)  where  T1  is  the  transition  matrix  defined  by  (10).  Applying  results  on  section 
4  we  have  : 
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/  Mp) 

1  -  b0(p)  0 

...0  \ 

/0 

0  . 

•  1  \ 

Mp) 

1  -  6o(p)  0 

...0 

<.«  T1  <st 

0 

0  . 

.  1 

v  Mp) 

1  -  M?)  o 

...0  j 

1 0 

0  . 

■  1 ) 

so  we  deduce  that  using  aggregation  technique  pfnj  =  1  —  &o(p)  and  p3Jup  =  1  (this  is  the  saturation 
case).  An  interpretation  of  the  pfnj  can  be  done  :  the  output  process  of  a  Geomx / D/FCFS/M i 
is  lower  bounded  by  the  output  process  of  the  particular  queue  Geomx /D/FCFS/1. 

For  the  configuration  of  the  switch  (2,2,2, 2)  with  p3Jup  —  1  and  M2  =  500,  the  loss  rate  is 
only  7.810~4  but  with  pvfup  =  0.956  the  loss  rate  value  of  10~9  is  reached  as  soon  as  M2  is  greater 
or  equal  to  100.  So  we  see  that  the  upper  bound  is  more  efficient  using  result  of  section  5  than 
the  upper  bound  obtained  using  the  result  of  section  4  (which  is  the  worst  case).  That’s  why 
we  focus  our  attention  on  bounds  on  the  loss  rate  at  the  second  stage  obtained  for  geometrical 
parameter  pfn]  and  pvscup. 

In  the  table  1  we  give  the  configuration  of  the  switch,  the  values  of  the  parameter  of  the 
bounding  geometrical  output  stream  pfn,  (recalling  that  this  value  is  obtained  using  results  of 
section  4)  and  pvspp  (recalling  that  this  parameter  is  obtained  using  the  algorithm  of  theorem 
5.2),  the  capacity  of  a  queue  at  the  first  stage  such  that  loss  rate  is  10-9,  the  capacities  of  queues 
at  the  second  stage  M2up  and  M'f1  such  that  the  loss  rate  is  equal  to  10~9  when  the  arrival  process 
at  an  input  port  of  the  second  stage  is  geometrically  distributed  with  parameter  pvfup  and  pfnj, 
respectively. 

The  computation  of  the  values  in  each  column  is  done  as  follows.  First  we  put  a  configuration 
of  a  switch  (first  column).  Then  we  compute  M\  (fourth  column)  such  that  the  loss  rate  is  equal  to 
10-9  when  input  process  is  geometrically  distributed  with  parameter  p  =  0.8.  Then  we  compute 
pfnf  using  results  of  section  4  and  pvfup  using  algorithm  given  in  theorem  5.2.  Then  we  finally 
compute  M2up  and  M'fS  such  that  loss  rate  is  equal  to  10~9  by  using  the  same  procedure  as  for 
the  computation  of  Mi  but  with  parameter  p  respectively  equal  to  pvscup  and  pfnf. 


(Ei,  ai,si,  s2) 

Pint 

VC 

r  sup 

Mi 

Mr 

Mr 

(2,  2, 2, 2) 

0.64 

0.956 

23 

100 

12  . 

(10,2,2,10) 

0.64 

0.956 

23 

172 

21 

(4, 2, 2, 8) 

0.64 

0.956 

23 

13 

9 

(4, 2, 4, 4) 

0.36 

0.617 

7 

17 

9 

(4, 2, 4, 8) 

0.36 

0.617 

7 

9 

6 

(5, 4, 4, 5) 

0.59 

0.968 

34 

210 

17 

(5,4,4,10) 

0.59 

0.968 

34 

13 

9 

(2, 4, 8, 2) 

0.344 

0.672 

10 

15 

6 

(2, 4, 8, 4) 

0.344 

0.672 

10 

7 

5 

(2, 4, 8, 6) 

0.344 

0.672 

10 

5 

4 

Table  1 
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6.1  Discussion 

First  of  all  let  us  recall  to  the  reader  that  the  saturation  case  (i.e.,  when  psJup  =  1)  is  the  worst 
case  for  some  configurations  as  (2,  2, 2,  2)  or  (2, 4,  8,  2).  But  when  the  configuration  is  “good” 
(i.e.,  when  the  number  of  outputs  is  much  greater  than  the  number  of  inputs  for  all  switching 
elements)  the  results  indicate  that  this  saturation  case  is  efficient  for  addressing  dimensioning 
problem.  As  an  example  for  the  configuration  (2, 4,  8,  6)  the  loss  rate  value  of  10-9  is  obtained  as 
soon  as  M2  >  6  with  psJup-  The  same  value  is  obtained  with  as  soon  as  M2  >  4. 

For  all  configurations  explored  here  the  maximum  error  is  about  1100%  which  could  be  suffi¬ 
cient  to  choose  the  best  configurations  which  are  here  (4,  2, 4, 4),  (4, 2,  4,  8),  (2,  4,  8,  2),  (2, 4,  8,  4) 
and  (2, 4, 8, 6). 

Last  but  not  least,  we  have  to  mention  here  the  most  important  result  of  this  work.  We  have 
found  approximated  method  wich  allows  us  to  address  dimensioning  problem  (of  course  for  a 
simple  model)  and  such  that  : 

it  computes  an  upper  bound  for  the  capacities  of  the  queues  at  the  second  stage  Af2“p  which  insure 
that  a  queue  with  capacity  M2“p  is  less  than  1 0— 9 , 

it  computes  a  lower  bound  for  the  capacities  of  the  queues  at  the  second  stage  M'ff*  which  allow 
us  to  compute  an  upper  bound  on  the  error  made  M2up  —  M'f1 . 

As  a  final  remark  let  us  notice  that  this  information  is  available  for  all  possible  configurations 
of  the  switch  and  we  do  not  have  to  validate  the  results  using  rare  event  simulation. 

7  CONCLUSION 

We  found  bounding  methodologies  to  address  the  dimensioning  problem  of  a  simple  ATM  switch. 
We  found  delayed  geometrical  bounds  on  the  output  stream  of  a  Geomx /D/l/M  queue  with 
finite  capacity  M  at  the  first  stage  of  this  switch  model  which  allows  to  bound  the  loss  rate. 
Except  bounds  based  on  the  saturation  case  (which  is  the  worst  possible  case)  this  method  is 
only  applied  to  answer  dimensioning  problem  of  queue  of  a  switching  element  which  is  connected 
with  independent  queues  from  the  previous  stage.  In  other  words  we  can  imagine  that  this  set  of 
methods  could  be  applied  to  address  dimensioning  problem  of  a  Delta  switch  but  we  know  at  the 
first  sight  that  this  will  give  bad  results  when  dimensioning  the  third  stage  of  a  Clos  Network. 

The  key  ideas  of  this  work  are  to  used  results  on  lumpability  and  Veinott’s  Criterion  which  is  a 
sufficient  condition  for  the  comparison  of  two  random  vectors  in  the  sense  of  the  strong  ordering. 
In  the  two  approaches  the  aim  is  to  reduce  the  state  space.  For  methods  based  on  lumpability  the 
idea  is  to  find  two  stochastic  matrices  which  are  bounding  a  given  stochastic  matrix.  The  matrices 
found  must  be  strongly  lumpable.  For  method  based  on  Veinott’s  Criterion  we  have  noticed  that 
in  some  cases  it  was  exactly  the  same  as  the  weak  lumpability  results  (see  [RS91]  for  instance). 
The  Bounds  we  found  are  not  optimal  except  when  the  transition  matrix  corresponding  to  the 
evolution  of  the  number  of  customers  in  a  Geomx  /  D/l/M  is  strong  or  weak  lumpable.  A  case 
when  this  transition  matrix  is  trivially  strong  lumpable  is  when  the  capacity  M  of  the  queue  is 
equal  to  1. 

The  complexity  of  the  methods  found  are  very  interesting.  Concerning  methods  based  on 
the  lumpability  results  (see  section  4)  their  complexity  for  obtaining  the  loss  rate  at  the  second 
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stage  is  in  0(Mf  +  M$).  Concerning  the  application  of  the  Veinott’s  Criterion  (see  section  5) 
the  computation  of  is  very  fast  (i.e.,  the  computation  time  is  very  much  lesser  than  the 
computation  of  the  loss  rate)  in  practice  and  the  computation  of  the  loss  rate  with  pvcu  is  still  in 

0(M|). 

What  we  want  to  stress  is  that  the  set  of  the  methods  presented  in  this  paper  contents  methods 
which  guarantee  a  lost  rate  less  or  equal  to  a  given  value  (i.e,  10~9  in  the  ATM  context)  and  give 
an  upper  bound  on  the  error  made.  This  result  is  obtained  only  using  numerical  methods  without 
any  simulation.  The  bounds  could  be  improved  and  as  an  example  let  us  notice  that  the  upper 
bound  obtained  by  saturation  could  easily  be  improved  by  leaving  queues  of  the  first  stage  on  the 
input  ports  of  a  switching  element  of  the  second  stage. 

But  these  methodologies  have  to  extended  in  two  directions  to  having  more  importance  in  the 
ATM  performance  measuring  community.  The  first  one  concerns  the  input  traffics,  the  second 
one  is  the  number  of  stages. 

7.1  Input  traffics 

One  of  the  most  restrictive  assumption  made  for  this  work  concerns  the  input  traffics  wich  are 
Bernoulli  traffics.  But  if  the  input  traffic  is  modelized  by  a  Markovian  process  the  results  could 
again  be  applied.  Of  course  the  precision  of  the  results  will  be  worst.  One  of  the  possible  track 
to  avoid  this  is  to  modify  the  method  based  on  Veinott’s  Criterion,  but  this  is  a  further  work. 

7.2  Adding  stages 

One  of  the  most  problem  is  probably  the  problem  of  the  performance  measures  of  MIN  with  more 
than  two  stages.  Assuming  that  the  input  processes  are  independent,  the  problem  of  the  correlation 
input  processes  of  switching  element  at  the  other  stages  is  due  nroslty  to  the  interconnection 
network  of  the  switching  elements.  This  means  that  the  method  proposed  here  could  be  adapted 
for  a  Delta  network  and  the  computation  of  the  loss  rate  at  the  third  stage.  But  the  bounding 
methodologies  could  not  be  used  to  estimate  the  lost  rate  at  the  third  stage  of  a  Clos  switch.  Of 
course  in  a  further  work  we  have  to  focus  our  attention  on  this  problem. 
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Abstract 

We  describe  a  diffusion  approximation  model  for  an  ATM  statistical  multiplexer  using  the 
instantaneous  return  model  approach  (Gelenbe,  1975).  Two  Cell  Loss  Estimates  are  proposed 
for  multiclass  traffic.  Our  aim  is  to  provide  a  novel  conservative,  accurate  and  computation¬ 
ally  efficient  method  for  predicting  cell  loss  probabilities  which  we  call  the  Finite  Buffer 
Diffusion  Cell  Loss  Estimate  (FBDCLE)  and  Infinite  Buffer  Diffusion  Cell  Loss  Estimate 
(IBDCLE).  We  evaluate  their  accuracy  by  comparing  them  with  simulation  results  using  a 
wide  variety  of  input  traffic  characteristics,  in  particular  we  test  the  model  with  traffic  which 
is  a  mixture  of  different  “On-Off”  sources  with  varying  loads.  Both  homogeneous  and  het¬ 
erogeneous  aggregated  arrival  processes  have  been  taken  into  account.  These  comparisons, 
which  include  evaluations  of  the  statistical  confidence  of  the  simulation  runs,  show  that  our 
model  predictions  are  very  close  to  the  simulation  results.  In  particular,  FBDCLE  is  a  con¬ 
servative  upper  bound  to  cell  loss  ratio,  while  the  other  (IBDCLE)  provides  an  accurate 
predictor  which  may  slightly  under-estimate  or  over-estimate  cell  loss. 


Keywords 

ATM  network  performance  prediction,  quality  of  service,  queueing  theory,  diffusion  model, 
call  admission  control,  bandwidth  allocation. 
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1  INTRODUCTION 

ATM  provides  a  universal  carrier  service  that  can  carry  voice,  data  and  video  using  the 
same  cell  transport  arrangement.  This  technique  allows  complete  flexibility  in  the  choice  of 
connection  bit  rate  and  enables  the  statistical  multiplexing  of  variable  bit  rate  traffic  streams. 
On  the  other  hand  it  also  introduces  a  risk  of  overload,  due  to  traffic  variations  which  may 
cause  network  capacity  to  be  exceeded.  Overload  is  the  main  cause  of  cell  loss  and  jitter  in 
such  systems.  Thus  the  performance  analysis  of  ATM  multiplexers  is  critical  to  the  design 
and  analysis  of  appropriate  control  mechanisms  for  call  admission,  bandwidth  allocation  and 
bandwidth  adaptation.  Although  much  work  has  been  done  on  the  computation  of  cell  loss 
ratios  or  probabilities  which  will  result  from  a  given  ATM  multiplexer  in  the  presence  of 
a  given  traffic  (Kobayashi  et  al,  1993)  (Heffes  et  al,  1986)  (Sriram  et  al. ,  1986)  (Akimaru 
et  al.,  1994),  there  is  still  much  room  for  improvement  in  the  methods  used  for  finding 
computationally  effective,  fast  and  tight  estimates  of  cell  loss. 

Typically,  call  admission  and  bandwidth  adaptation  controls  use  estimates  of  cell  loss 
ratio  for  a  given  description  of  the  incoming  traffic  at  an  ATM  multiplexer  or  along  a  path 
traversing  a  series  of  multiplexers.  For  instance  the  call  admission  control  policy  used  in 
IBM’s  ATM  architectures  (Guerin  et  al.,  1992)  bases  its  bandwidth  allocation  conservatively 
using  the  minimum  of  two  cell  loss  estimates:  one  based  on  equivalent  bandwidth  and  the 
other  on  a  Gaussian  approximation  of  cell  loss  probability.  Therefore  more  accurate  estimates 
of  cell  loss  probabilities  will  necessarily  lead  to  better  decisions  for  call  admission.  Thus  it 
is  important  to  be  able  to  estimate  cell  loss  ratios  within  a  very  wide  range  of  variations 
ranging  from  10-1  at  the  high  end  to  less  than  10~7  at  the  low  end.  It  is  important  that  the 
estimates  obtained  be  conservative,  i.e.  that  they  be  upper  bounds,  so  that  any  bandwidth 
allocation  based  on  these  estimates  does  result  in  higher  cell  loss  ratios.  However,  it  is  also 
essential  that  the  estimate  be  a  tight  upper  bound  so  that  it  will  not  result  in  the  wasteful 
allocation  of  excessive  bandwidth.  Another  consideration  for  any  tool  used  for  estimating 
cell  loss  is  its  computational  cost.  Many  of  the  decisions  making  processes  which  use  such 
estimates  will  have  to  be  carried  out  in  real  time  at  low  computational  cost.  Therefore  our 
research  aims  at  obtaining  a  tight,  conservative  and  computationally  effective  method  for 
estimating  cell  loss  in  an  ATM  multiplexer  from  given  traffic  characteristics.  This  paper  uses 
diffusion  approximations  to  contribute: 


•  a  conservative  cell  loss  ratio  estimate  we  name  FBDCLE  ( Finite  Buffer  Diffusion  Cell 
Loss  Estimate), 

•  and  a  tight  estimate  we  call  IBDCLE  ( Infinite  Buffer  Diffusion  Cell  Loss  Estimate), 


for  superposed  multiclass  “On-Off”  traffic.  We  use  simulations  to  show  the  validity  of  FBD¬ 
CLE  and  IBDCLE  in  the  cell  loss  ratio  range  between  10-1  and  10~5. 

We  describe  the  diffusion  model  in  Section  2.  In  Section  3  and  Section  4  we  derive  the 
FBDCLE  and  the  IBDCLE.  In  section  5  we  use  the  two  estimates  to  compute  cell  loss  ratios 
for  multiple  class  “On-Off”  traffic,  and  compare  the  analytical  results  with  simulations  for 
a  wide  variety  of  input  traffic  characteristics  and  different  loads. 
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2  THE  DIFFUSION  MODEL 


Diffusion  approximations  are  continuous  approximations  to  the  discontinuous  arrival  and 
service  processes  in  queueing  models.  They  have  long  been  used  in  queueing  theory  to  model 
traffic  and  service.  Their  advantage  is  that  they  will  generally  result  in  computationally 
more  tractable  models  of  performance  for  more  detailed  traffic  representations,  that  what 
can  be  obtained  from  a  direct  study  of  the  corresponding  discrete  processes.  In  the  past,  two 
different  approaches  to  diffusion  approximations  for  queueing  models  have  been  proposed.  In 
both  cases  whenever  the  queue  length  is  non-zero  and  the  maximum  buffer  capacity  has  not 
been  attained,  the  queue  length  distribution  is  approximated  by  solving  a  partial  differential 
equation.  However  the  two  methods  differ  according  to  the  choice  of  boundary  conditions. 
The  simpler  one  uses  reflecting  boundaries  (Kobayashi,  1974)  (Kobayashi  et  al. ,  1993)  so  that 
no  probability  mass  accumulates  at  the  boundaries.  Clearly  this  approach  will  not  be  totally 
satisfactory  if  the  boundaries  themselves  are  very  important  to  the  process  being  modeled. 
The  more  sophisticated  approach  is  based  on  the  “instantaneous  return  process”  (Gelenbe, 
1975)  (Gelenbe  et  al.,  1976)  (Duda,  1986)  which  combines  the  partial  differential  equation 
formulation  for  the  process  strictly  inside  the  boundaries,  with  a  discrete  state-space  model 
at  the  boundaries  themselves  (Gelenbe,  1975).  This  leads  to  a  more  accurate  model  of  the 
queueing  behavior  of  the  system  when  the  load  is  low,  or  when  the  queue  length  is  close  to 
the  maximum  value  allowed  by  a  finite  buffer. 

Diffusion  approximations  require  that  the  first  two  moments  of  the  interarrival  and  service 
times  be  known.  These  can  be  directly  deduced  from  measurements  or  from  other  traffic 
models,  such  as  the  “On-Off”  model  often  used  in  the  literature  (Heffes  et  al.,  1986)  (Sriram 
et  al.,  1986).  The  diffusion  approximation  approach  we  take  for  an  ATM  multiplexer  buffer 
of  size  B,  considers  a  random  process  {X(<),  t  >  0}  to  represent  the  buffer  contents.  In  the 
open  interval  ]0,B[  (excluding  the  two  boundaries)  it  is  a  continuous  random  variable  with 
probability  density  function  f(x,t)  defined  as: 

/(x,  t)dx  =  Pr[x  <  X(t)  <x  +  dx},x  €]0,  B[,  (U 


while  at  the  boundaries  we  have: 


m(t)  =  Pr[X(t)  =  0],  (2) 

M(t)  =  Pr[X(t)  =  B ].  (3) 


The  parameters  for  the  diffusion  process  inside  in  ]0,B[  are  the  “drift”  or  instantaneous 
average  rate  of  change: 


u,  =  lim 
Al— *-0 


E\X[t  +  At)-X{t)\X(t)  e]o,ff[ 


At 


and  the  instantaneous  variance  of  the  change  in  X(t): 


(4) 


a  = 


,  Var[X{t  +  At) 
lim  - - - 

Al— *0 


-X(t)\X(t)  g]0,H[] 
At 


(5) 
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and  a  will  depend  on  the  variance  of  the  interarrival  and  service  times  at  the  ATM  multi¬ 
plexer.  Since  the  service  time  is  constant  due  to  the  fixed  length  of  the  cells  being  transmitted, 
a  will  only  depend  on  the  variance  of  interarrival  times.  Assuming  time-independent  traffic 
characteristics,  let  the  mean  aggregate  cell  arrival  rate  to  the  buffer  be  A  and  the  multiplexer 
cell  transmission  rate  be  C,  both  given  in  cells  per  second.  Then  we  will  have: 

fi  =  X  —  C.  (6) 


In  the  instantaneous  return  process  model,  when  queue  length  reaches  the  lower  boundary 
of  the  interval  at  x  =  0,  it  remains  there  for  a  random  length  of  random  time  which  we 
denote  h.  This  time  clearly  represents  a  period  when  the  buffer  is  empty,  and  it  ends  as  soon 
as  a  cell  arrives  to  the  multiplexer.  At  that  time,  say  r,  the  process  X(t)  will  jump  from 
A(r)  =  0  to  Ar(r+)  =  +1.  Similarly  for  the  upper  boundary  at  x  —  B  where  the  random 
time  spent  at  the  boundary  will  be  denoted  by  H ,  while  the  jump  of  the  queue  length  process 
will  be  from  the  value  B  to  the  value  B  —  1  representing  the  end  of  a  service  or  transmission 
epoch  for  a  cell,  resulting  in  a  decrease  of  buffer  length  by  1.  This  behavior  results  in  the 
following  system  of  equations  for  the  ATM  multiplexer  queue  length  process  as  derived  in 
(Gelenbe,  1975)  in  steady  state,  where  we  have  dropped  the  dependence  on  t: 

d  ad2.,.  m  „  M 

+  2  -  »  +  Jwf1  ~  B  + 11  =  0  (7) 


x—>0+  z 


df(x) 

dx 


E[h\ 


lim 

x— >-0+ 


J  f(x)dx  =  0, 


(8) 


lim  [-fif(x)  +  £ 

x-+B~  Z 


df(x) 

dx 


M 
E[H } 


lim 

x—*B~ 


(9) 


where  S(x)  is  the  Dirac  Delta  function.  Also  the  probabilities  must  sum  to  1: 


m  +  M  + 


f(x)dx  -  1 


(10) 


These  equations  have  a  simple  interpretation.  Equation  (7)  represents  the  stationary  be¬ 
havior  for  the  motion  of  the  queue  length  process  in  the  interval  ]0,5[,  and  the  effect  of 
the  jumps  of  the  process  X(t)  from  0  and  B  into  the  interval.  On  the  other  hand  (8)  rep¬ 
resents  the  depletion  of  the  probability  mass  m  at  the  lower  boundary  due  to  the  jumps  to 
+  1  at  the  end  of  the  holding  time  at  the  lower  boundary,  as  well  as  the  flow  of  probability 
mass  from  inside  the  interval  ]0,  B[  towards  the  lower  boundary.  Equation  (9)  has  a  similar 
interpretation. 
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2.1  Queue  length  distribution  of  finite  capacity 

The  above  equations  may  be  solved  directly  (Gelenbe,  1975)  to  obtain: 


[  $  [l-e7*],  0  <  x  <  1 

f{x)  =  {  $  [e-^-lje^,  \<x<B-\  (11) 

(  [e'1'(x_s)-l]e7(s_1),  B-l<x<B 

with  m  and  M  the  probability  masses  at  0  and  at  B,  respectly,  at  stationary  state  being: 


m  =  -pE[h]$, 


(12) 


M  =  (13) 

where  7  =  ^,  and 

$  =  - ; - 1 -  (14) 

(1  -  i*E[h])  -  (1  +  nE[H\)er*B-i) 

2.2  Queue  length  distribution  of  infinite  capacity 


If  we  consider  a  diffusion  process  on  the  whole  non-negative  real  line,  i.e.  as  if  the  queue 
length  were  infinite,  with  holding  time  h  only  at  x  =  0,  we  will  have  the  following  formula 
for  an  unbounded  queue  diffusion  approximation  model: 


f  <F  [l-e'yx],  0  <  x  <  1 

1  $  [e-M]^*,  1  <  x 


(15) 


m  =  1  —  $ 


(16) 


$  =  -  oj 

(1  -  pE[h\) 

In  the  following  sections,  we  will  derive  the  practical  applications  of  diffusion  approxima¬ 
tion  models  both  for  bounded  queue  and  unbounded  queue: 

•  Finite  Buffer  Diffusion  Cell  Loss  Estimate  (FBDCLE); 

•  Infinite  Buffer  Diffusion  Cell  Loss  Estimate  (IBDCLE). 

In  order  to  make  use  of  these  diffusion  models  we  will  need  to  determine  the  parameters  fi, 
a,  E[h]  and  E[H]  from  the  arrival  and  service  characteristics  of  the  ATM  multiplexer.  From 
engineering  application  viewpoint  of  diffusion  approximation  models,  various  strategies  can 
be  used  to  obtain  E[h]  and  E[H}.  More  detail  will  be  presented  when  we  derive  FBDCLE 
and  IBDCLE. 
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3  FINITE  BUFFER  ESTIMATE  -  FBDCLE 

In  general  the  distributions  for  the  residence  times  of  moderately  complex  finite  capacity 
queueing  models  at  the  upper  and  lower  boundaries  0  and  B  are  unknown.  Their  character¬ 
ization  can  be  quite  complex  and  depends  on  both  the  arrival  process,  the  buffer  size,  and 
the  service  process.  Thus  we  will  have  to  calculate  E[h\  and  E[H ]  in  a  heuristic  but  plausible 
manner. 


3.1  Calculation  of  E[h]  and  E[H] 

If  the  arrival  process  can  be  approximated  by  a  Poisson  process  with  arrival  rate  A  it  follows 
that  E[h)  =  A-1.  Since  the  arrival  traffic  to  an  ATM  multiplexer  is  made  up  of  many  super¬ 
posed  sources,  when  the  number  of  sources  is  large  this  approximation  may  be  acceptable. 
In  our  simulations  it  turns  out  that  this  heuristic  for  E[h\  slightly  underestimates  the  actual 
value  for  superposed  “On-Off”  sources. 

Recall  that  the  time  for  transmitting  one  cell  is  C-1.  Now  assume  that  at  instant  t  the 
transmission  of  a  cell  begins  and  that  X(t)  =  B  -  1.  At  some  instant  t  +  Z  before  t  +  C_1 
another  arrival  occurs  so  that  now  X[t-\-Z)  =  B.  Then  H ,  the  random  variable  representing 
the  holding  time  at  the  upper  boundary,  has  the  following  distribution: 


Pr[H  <  v]  =  Pr{ i  -Z  <v\Z<~] 
G  G 


.Pr[A-  —  Z  <  v  and  Z  <  ~] 
Pr[Z  <  b] 


(18) 


We  make  the  simplifying  approximation  that  the  arrival  process  is  Poisson  of  rate  A  so  as 
to  complete  the  computation,  on  the  basis  that  it  is  justified  when  the  arriving  traffic  results 
from  the  superposition  of  many  independent  sources.  Then 


Pr[Z<~)  =  1  —  e~c, 


and 


Pr[^  ~  Z  <v  and  Z  <  i]  =  Pr[ ^  -  v  <  Z  < 
Thus 

Pr[H  <  u]  = 


=  e  c 


tXv  —  1], 


eXv  —  1 


ec  —  1 
with  density  function 

/hw  =  (£  °-w-£ 

1 0,  elsewhere 


(19) 


(20) 


(21) 


(22) 
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We  can  now  derive  the  estimate  for  the  average  holding  time  at  the  upper  boundary: 

E[H]  =  r  vfH(v)dv  =  -ir  -  i  (23) 

Jo  1  - A 

Of  course,  the  first  and  second  moments  of  the  interarrival  times  are  also  needed  in  order  to 
compute  the  density  function  f(x)  and  the  probability  masses  m  and  M .  However,  these  mo¬ 
ments  will  be  available  from  the  practical  measurement  and  the  precise  traffic  characteristics 
we  shall  use  and  will  be  discussed  later  in  Section  5. 

3.2  Estimating  the  cell  loss  ratio 

The  long  run  cell  loss  ratio  L  is  the  proportion  of  cells  lost  at  the  entrance  to  the  multiplexer 
due  to  buffer  overflow,  to  total  cells  arriving  to  the  multiplexer.  It  is  the  primary  measure 
of  interest  in  this  study  and  it  needs  to  be  estimated  both  accurately  and  in  a  conservative 
manner.  Thus  what  is  needed  is  in  fact  a  tight  upper  bound,  rather  than  a  relatively  accurate 
value  which  may  underestimate  L.  Clearly  cells  will  be  lost  only  when  the  buffer  is  full,  i.e. 
when  buffer  length  has  attained  size  B,  in  which  case  all  the  arriving  cells  will  be  lost.  Thus 
the  cell  loss  ratio  in  steady  state  may  be  written  as: 

L  =  lim  M(t)Pr[  N(t,t  +  H)  >  1  |  X{t)  =  B  ],  (24) 

<—>■00 

where  N(t,t  +  H)  is  the  number  of  arrivals  in  the  open  interval  (f,t  +  H).  If  the  arrival 
process  is  stationary  in  time  and  independent  of  buffer  size,  in  steady  state  the  expected  cell 
loss  ratio  is: 

L  =  M.Pr[  N{t,t  +  H)  >  1  ].  (25) 

There  are  several  difficulties  with  using  this  expression  when  one  deals  with  real  traffic, 
including  the  issue  of  estimating  H  and  the  probability  of  the  number  of  arrivals  in  the 
interval  when  the  buffer  is  full.  However  we  do  know  that  H  <  Thus  we  have  found  that 
L'pg  given  below,  which  we  call  the  Finite  Buffer  Diffusion  Cell  Loss  Estimate  ( FBDCLE ), 
is  a  useful  and  tight  upper  bound  which  yields  cell  loss  ratio  values  which  are  within  the  same 
order  of  magnitude  as  the  value  measured  from  simulation  with  various  forms  of  “On-Off” 
traffic: 

L  <  L*fb  =  M.Pr[N{t,  t  +  >  1].  (26) 

The  quality  of  this  estimate  L“FB  has  been  tested  by  simulation  with  a  very  wide  variety  of 
“On-Off”  traffic  models,  as  shown  in  the  simulation  results  we  present. 


4  INFINITE  BUFFER  ESTIMATE  -  IBDCLE 

As  indicated  previously,  the  exact  average  residence  times  E[h\  and  E[H]  of  the  finite  capacity 
queueing  model  at  the  upper  and  lower  boundaries  are  not  known  in  general  and  are  difficult 
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to  obtain.  Thus  we  consider  an  alternate  formulation  -  infinite  capacity  queueing  model 
where  we  only  deal  with  the  holding  time  at  lower  boundary  x  =  0.  Now  the  key  value  used 
for  estimating  the  cell  loss  probability  will  be  the  stationary  probability  that  the  diffusion 
process  exceeds  the  value  B: 


PB  =  Pr[X  >  B]  (27) 

From  (15)  we  estimate  the  buffer  overflow  probability  Pg: 

PB  =  $I[1  -  (28) 

7 

If  R(t)  is  the  instantaneous  cell  arrival  rate,  then  the  new  diffusion  cell  loss  ratio  estimate  L 
is: 


L 


E[(R(t)  -  C)+] 

Em)] 


(29) 


since  cell  loss  will  only  occur  if  the  arrival  rate  is  greater  than  the  multiplexer’s  service 
capacity  C  whenever  the  buffer  length  is  at  least  B. 


4.1  Choice  of  E[h\ 

It  is  known  that  for  the  GI/GI/1  queue  with  arrival  rate  A  the  average  idle  time  E[h\  satisfies 
(Medhi,  1991): 


E[h]  >  E[h ]*  =  i  -  i  (30) 

Thus  we  will  approximate  E[h]  by  its  lower  bound  E[h]*,  all  other  things  being  equal,  the 
resulting  probability  Pg  that  the  queue  length  exceeds  B  will  be  larger  than  real  value  PB. 
This  is  because  the  process  will  be  spending  less  time  at  x  =  0  and  therefore  will  be  more 
likely  to  exceed  B.  This  can  also  be  easily  proved  by  applying  inequality  of  (30)  into  (28). 


4.2  Estimating  the  cell  loss  ratio 

The  estimate  L}B,  which  we  call  the  Infinite  Buffer  Diffusion  Cell  Loss  Estimate  (IBDCLE), 
which  in  turn  is  obtained  by  replacing  E[h\  by  E[h\*  in  equation  (28).  IBDCLE  will  be: 


_  E[(R(t)-C)+ } 

^ IB  ~  rB  J - 


(31) 


since  E[R(t)]  =  A  if  R(t)  is  stationary. 
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5  CELL  LOSS  ESTIMATES  FOR  “ON-OFF”  MULTICLASS  TRAFFIC 

In  this  section  we  present  the  numerical  and  simulation  results  to  evaluate  the  accuracy  of 
FBDCLE  and  IBDCLE  for  a  wide  variety  of  “On-Off”  traffic  models.  Much  of  the  work  on 
ATM  traffic  analysis  and  cell  loss  estimates  is  based  on  the  “On-Off”  traffic  model  and  on 
the  superposition  of  such  traffic  streams  (Heffes  et  al. ,  1986)  (Sriram  et  al.,  1986).  Thus  it  is 
of  particular  interest  to  evaluate  the  accuracy  of  our  cell  loss  estimates  (diffusion  estimate) 
for  this  specific  class  of  practically  useful  traffic  models.  In  order  to  do  so,  we  will  first  derive 
the  appropriate  traffic  parameters  to  be  used  in  the  diffusion  approximation. 

5.1  The  traffic  model 

Consider  first  a  single  user  u  whose  traffic  follows  a  simple  “On-Off”  behavior.  This  user  u 
either  sends  traffic  into  the  network  at  a  constant  peak  rate  Ru  during  the  “On”  period,  or 
it  sends  no  traffic  at  all  during  the  “Off”  period.  The  following  notation  describes  this  traffic 
model: 

•  Ru  -  peak  traffic  rate  during  the  “On”  period,  Tu  =  1  /Ru; 

•  0~l  -  average  length  of  the  “Off”  period; 

•  /3~J  -  average  length  of  the  “On”  period; 

•  au  =  0u/(f3u  +  6U)  -  source  activity. 

The  duration  of  the  successive  On  and  Off  periods  are  assumed  to  be  independent,  so  that 
the  cell  arrival  process  from  a  single  such  source  is  a  renewal  process.  The  cell  interarrival 
time  will  be  denoted  by  Yu,  and  let  Fu(x )  =  Pr[Yu  <  x]  so  that  (Heffes  et  al.,  1986): 

Fu(x)  =  [(1  -  (3uTn)  +  puTu{  1  -  e-e^x~T^)]U{x  -  Tu)  (32) 

where  U(x)  is  the  unit  step  function.  The  Laplace-Stieltjes  transform  (LST)  of  the  interarrival 
time  density  is  given  by: 

f(s)  =  /  e-sxdFu(x)  =  [1  -  (3UTU  +  (3uTu6u/(s  +  0u)]e~sT“  (33) 

Jo 

The  mean  cell  arrival  rate  of  cells  from  source  u  is  then: 

Au  =  -l//'(0)  =  1  /{Tu  +  PuTu/eu)  =  au/Tu  =  auRu  (34) 

Let  Au(t)  denote  the  number  of  arrivals  of  cells  of  user  stream  u  in  the  interval  [0,t).  Then 
the  squared  coefficient  of  variation  of  the  interarrival  time  from  source  u  is  (Cox  et  al.,  1966) 
(Heffes  et  al.,  1986): 

2  Var[Yu]  _  Var[Au(t )] 

C“~  FA[YU\  ~  E[Au(t)} 


(35) 


242 


Part  Six  Models  of  ATM  Switches 


which  leads  to  (Heffes  et  al,  1986): 

2  i  -  (i  -  PuTuy 

Cu  {PuTu  +  euTuy  • 

(36) 

Since  LJ[AU(<)]  =  A ut,  we  can  write  (35)  as: 

Var[Au{t)\  Var[Yu)  2 

fe  1  =  A'  P\Y,\  =  A“C" 

(37) 

Now  if  the  total  arrival  process  to  the  ATM  multiplexer  results  from  the  superposition 
of  N  uncorrelated  “On-Off”  sources  of  renewal  type  as  discussed  above,  A(t)  the  resulting 
counting  process  A(t)  =  J2u=i  Au(f)  has  the  obvious  properties: 

E[A(t)]  =  J2E[Au(t)\, 

U=  1 

(38) 

Var[A(t)\  =  J2  Var[Au(t )] 

U=1 

(39) 

and 

**3 

II 

-K> 

«l 

(40) 

Var[A(t )]  =  Kc2ut 

(41) 

U=1 


Let  D(t,t  +  r)  denote  the  number  of  departures  in  an  interval  [t,t  -f  r)  when  the  queue 
is  non-empty.  Note  that  if  the  multiplexer  queue  is  non-empty,  then  the  service  or  emptying 
process  at  the  queue  is  independent  of  the  arrival  process.  Thus  we  have: 

E[X(t  +  At)  -  X(t)\X{t)  >  0]  =  E[A(t  +  At)  -  M(t)]  -  E[D(t  +  At)  -  D{t)}  (42) 

and 

Var[X{t  +  At)  -  X{t)\X{t)  e]0,  B{ }  =  Var[A{t  +  At)  -  A(f)]  +  Var[D(t  +  At)  -  D{t)}{ 43) 

so  that 


u  =  lim 

A(^0 


a  —  lim 
Ai->0 


E[X(t  +  At)  -  X(t)\X(t)  e]0,ff[] 

At 

V ar[X(t  +  At)  -  X(t)\X(t)  e]Q,  B\ 
At 


N 

E  au  -  c, 

XL—  1 

(44) 

N 

=  E  x-cl- 

XI—  1 

(45) 
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We  now  have  all  the  parameters  needed  by  the  diffusion  model  described  in  Sections  2,3  and 
4  when  it  is  used  for  superposed  “On-Off”  traffic  sources,  and  can  use  it  to  calculate  the 
IBDCLE  and  FBDCLE  formulae  given  in  (26)  and  (31). 

5.2  The  distribution  of  the  number  of  arrivals 

In  order  to  calculate  the  FBDCLE,  the  quantity  Pr[N(t,t  +  ;=;)]  must  be  obtained.  To  do 
so,  we  will  consider  the  general  case  of  arrival  traffic  composed  of  multiple  “On-Off”  sources 
of  K  different  types.  Each  source  of  the  same  type  will  have  the  same  set  of  parameters,  and 
Nk  will  be  the  number  of  fc-type  sources,  each  with  the  same  peak  traffic  rate  Rk,  activity 
ak ■  Notice  that  here  we  use  the  subscript  k  to  denote  a  user  type,  rather  than  the  subscript 
u  to  denote  an  individual  user.  The  total  number  of  users  or  sources  is  then  N  =  J2k=i  Nk. 
The  average  arrival  rate  of  cells  will  then  be: 

K 

A  =  £  akNkRk  (46) 

k= 1 


Now  let  Zk(t)  be  the  random  variable  denoting  the  number  of  sources  of  type  k  which  are 
“On”  at  some  time  t.  Since  the  sources  are  independent  and  stationary  we  have  for  large 
enough  t  that: 


Pr[Zk{t)  =  rii, ..., ZK(t)  =  nK]  =  n  (47) 

k= l  \nkJ 

On  the  other  hand  for  small  enough  l/C  : 

N(t,t  +  I)  =  [Zk{t)Rk  +  ...  +  ZK(t)RK}/C,  (48) 

so  that: 

Pr[N(t,t  +  ~)  >  1]  -  PriZiWRk  +  ...  +  ZK(t)RK]  >  67],  (49) 

which  can  be  computed  from  the  distribution  (47).  For  homogeneous  traffic,  i.e.  when  all 
sources  are  of  just  one  type,  we  simply  have  K  =  1  and: 

.  mt(C / R\ )  /  at  \ 

Pr[N(t,t  +  ±)>  1]  =  1  -  £  (  : 1  )a?(l-ai)Nl-n'.  (50) 

For  the  IBDCLE  we  need  E[(R(t)  —  C)+ ]  to  be  used  in  (31),  which  is  computed  for  the 
superposed  multiclass  “On-Off”  traffic  as: 

E[(R(t)  -  C)+]  =  £  (n1Rl  +  ...  +  nKRK-C)+Pr[Z1(t)  =  n1,...,ZK(t)  =  nK]  (51) 

n\  0 
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—  Prediction  with  diffusion  model  of  finite  buffer 
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For  single  On-Ofi  source: 


Link  Capacity  C  =  150  (Mbits/sec) 
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Figure  1  Cell  loss  probability  vs.  buffer  size:  comparison  of  simulation  and  DCLE  for 
homogeneous  sources  under  varying  load  (load  =  aggregate  mean  arrival  rate  /link  capacity). 


5.3  Comparison  of  numerical  and  simulation  Results 

In  this  section  we  present  the  numerical  and  simulation  results  to  evaluate  the  accuracy 
of  FBDCLE  and  IBDCLE.  The  validation  of  our  new  diffusion  model  is  focused  on  the 
comparison  of  the  cell  loss  probability  predicted  by  the  FBDCLE  and  IBDCLE  and  that 
obtained  by  simulations  for  a  wide  variety  of  “On-Off”  traffic  models.  In  our  simulations, 
the  runs  were  independently  replicated  20  times,  and  each  run  included  the  transmission 
of  107  cells.  Confidence  intervals  are  calculated  using  the  Student  —  t  distribution  with 
98%  confidence  so  that  the  simulation  results  are  of  sufficiently  high  statistical  quality.  The 
resulting  confidence  intervals’  width  is  also  shown  on  the  figures. 
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source  activity  a 


Figure  2  Cell  loss  probability  vs.  source  activity  (burstiness):  comparison  among  sim  ula- 
tions  and  analytical  approach  using  DCLE  for  the  homogeneous  sources  under  variant  load 
(load  =  aggregate  mean  rate  /link  capacity). 


Figures  1  and  2  summarize  the  results  for  traffic  with  homogeneous  sources. 

In  Figure  1  cell  loss  probability  ( Pr[cell  /oss])  is  plotted  versus  buffer  size  B  for  different 
load,  which  is  A/C.  The  ATM  multiplexer  we  consider  here  is  a  high  speed  link  with  link 
capacity  C  =  150M bits/ sec  and  there  are  a  collection  of  homogeneous  traffic  sources  which 
are  very  bursty  with  an  activity  value  of  a  —  0.1,  which  means  that  it  is  at  its  peak  value 
10%  of  the  time  and  is  “Off”  the  rest  of  the  time.  Load  is  varied  in  Figure  1  simply  by 
varying  the  number  of  sources.  The  results  show  that  for  cell  loss  ratio  ranging  from  the  high 
10~5  to  the  10-1  values,  the  FBDCLE  (the  solid  line)  provides  a  conservative  upper  bound, 
while  the  IBDCLE  (the  dashed  and  dotted  line)  is  an  accurate  predictor  which  remains  well 
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—  Prediction  with  diffusion  model  of  finite  buffer 
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Figure  3  Cell  loss  probability  versus  buffer  size:  comparison  between  simulation  and  DCLE 
for  heterogeneous  sources  with  varying  load  (load  =  aggregate  mean  rate  /link  capacity). 


within  the  confidence  intervals.  Simulation  results  are  shown  by  the  dotted  lines  while  the 
98%  confidence  intervals  are  vertical  lines. 

In  Figure  2  similar  results  are  observed  when  source  activity  a  (or  burstiness)  is  varied 
widely  for  different  values  of  the  load.  Here  each  individual  source  generates  cells  at  an 
average  rate  \u  =  l(Mbits/sec)  and  the  buffer  size  is  relatively  small:  B  —  20  cells.  Note 
that  here  we  see  that  IBDCLE  is  an  accurate  predictor  over  cell  loss  ratio  values  ranging 
from  5  x  10“6  to  3  x  10-2. 

Figures  3  and  4  compare  FBDCLE  and  IBDCLE  with  simulation  under  heterogeneous 
traffic.  We  have  chosen  two  types  of  sources  -  more  bursty  sources  with  au  =  0.1  and  less 
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load 


Figure  4  Cell  loss  probability  versus  load:  comparison  between  simulation  and  the  DCLE 
for  heterogeneous  sources  (load  =  aggregate  mean  rate  /link  capacity). 


bursty  sources  with  au  =  0.5.  If  N\  and  N2  denote  the  number  of  sources  with  au  =  0.1  and 
au  =  0.5  respectively,  and  TV  =  N\  TV2. 

In  Figure  3  we  show  matched  results  of  simulations  and  the  diffusion  predictions  for  two 
different  values  of  the  load,  and  under  different  combinations  of  N\  and  TV2  with  varying 
buffer  size  B.  Note  that  the  two  classes  are  also  characterized  by  two  much  different  values 
of  peak  traffic  rate:  Ri  =  \0(Mbits/ sec)  and  R2  —  2(Mbits/sec).  Again  we  see  that  the 
FBDCLE  (the  solid  line)  gives  a  bounded  estimate  while  IBDCLE  provides  a  very  accurate 
prediction  (the  dashed  and  dotted  line). 

In  Figure  4  the  cell  loss  probability  is  plotted  versus  traffic  load  for  a  fixed  buffer  size 
B  =  100,  the  same  two-class  traffic  as  in  Figure  3  and  five  different  load  values  obtained  by 
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varying  the  mixture  of  class  1  and  class  2  traffic.  The  simulation  results,  together  with  their 
confidence  intervals,  show  once  again  excellent  agreement  with  our  infinite  buffer  estimate 
(IBDCLE)  while  the  FBDCLE  is  again  an  upper  bound,  for  cell  loss  ratio  values  going  from 
5  x  10“5  to  3  x  10"2. 

We  conclude  from  these  results,  and  from  others  which  are  available  but  which  are  not 
reported  here  because  of  space  limitation,  that  the  FBDCLE  can  be  used  for  a  very  conser¬ 
vative  estimate  of  cell  loss,  while  IBDCLE  is  useful  as  an  accurate  predictor. 
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Abstract 

This  work  addresses  model-based  evaluation  of  cell  loss  probabilities  for  an  ATM  switching 
element  with  a  shared  output  buffer.  The  incoming  traffic  to  the  switch  is  represented  by  the 
superposition  of  N  bursty  input  sources,  each  of  which  is  modeled  as  a  two-state  (On/Off) 
Markov  chain.  For  such  systems,  we  consider  an  integrated  approach  to  their  evaluation  that 
employs  both  exact  and  approximate  solutions.  The  exact  method  is  based  on  a  reduced 
Markov  model  obtained  by  lumping  the  states  according  to  certain  symmetries  of  the  traffic 
model.  However,  even  with  such  reduction,  numerical  solutions  are  feasible  only  if  the  switch 
dimensions  involved,  particularly  the  number  of  output  ports,  are  reasonably  small.  We  then 
introduce  a  new  approximate  solution  algorithm  that  can  be  applied  to  larger  switches.  By 
comparing  the  results  obtained  with  those  of  the  exact  method,  we  find  that  the  errors  of 
approximation  are  relatively  small.  Moreover,  due  to  the  iterative  nature  of  the  approximate 
solution  algorithm,  the  two  methods  can  be  integrated  so  as  to  yield  even  more  accurate 
results  with  less  execution  time. 


Keywords 

ATM  switch,  shared  buffers,  On/Off  sources 
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1  INTRODUCTION 

Services  both  realized  and  planned  for  broadband,  ATM-based,  ISDNs  impose  extremely 
severe  constraints  on  the  performance  of  ATM  switching  elements.  In  particular,  admissible 
cell  loss  probabilities  as  small  as  10'9  (or  even  less)  call  for  switch  buffers  that  are  sufficiently 
large  to  guarantee  this  quality  of  service.  In  this  regard,  it  has  been  shown  (see  [1,2],  for 
example)  that  the  best  utilization  of  buffer  capacity  is  obtained  by  dynamically  sharing  cell 
storage  among  all  the  output  ports  of  the  switch.  This  permits  a  reduction  of  required  capacity 
(for  a  specified  admissible  cell  loss  probability)  relative  to  switches  which  employ  dedicated, 
fixed-capacity  queues  at  either  the  input  or  the  output.  However,  the  problem  of  evaluating  the 
loss  performance  of  a  shared-buffer  switch  is  difficult,  due  primarily  to  the  large  number  of 
internal  states  that  must  be  accounted  for  in  the  process,  even  when  the  switch  dimensions  are 
relatively  small.  Therefore,  various  studies  have  proposed  approximate  solutions  to  this 
problem,  assuming  further  (see  [3,4,5],  for  example)  that  traffic  sources  for  the  input  ports  are 
represented  by  independent  Bernoulli  arrival  processes,  thus  precluding  any  correlation 
between  cell  arrivals.  Among  such  investigations,  perhaps  the  most  widely  cited  is  [3]  which 
presumes  an  infinite  buffer  and  approximates  its  steady-state  occupancy  distribution  with  a 
Gamma  function.  The  parameters  of  the  Gamma  distribution  are  obtained  analytically  by 
computing  the  mean  and  variance  of  the  shared-buffer  occupancy  distribution.  This  method 
provides  a  practical  means  of  quickly  estimating  the  required  buffer  capacity  of  a  switch. 
However,  since  it  matches  only  the  first  two  moments  of  the  actual  distribution,  it  often  fails 
to  accurately  estimate  the  distribution's  “tail”,  i.e.,  the  probabilities  of  large  occupancies 
which  have  very  low  values. 

Another  simple  way  to  estimate  the  buffer  occupancy  distribution  of  a  shared  buffer  with 
uncorrelated  traffic  is  by  convolving  the  individual  distributions  of  a  number  of  Geo/D/ 1 
queues.  Since  Geo/D/1  models  are  relatively  easy  to  solve  (as  discussed  in  [6],  for  example), 
this  method  is  also  attractive.  Other  studies,  such  as  those  of  [7,4],  suggest  more  complex 
heuristic  algorithms  that  typically  lead  to  more  precise  solutions. 

Although  Bernoulli  sources  are  convenient  by  virtue  of  their  simplicity,  a  more  realistic 
arrival  process  should  capture  correlation  that  exists  between  successive  arrivals  at  an  input 
port.  This  is  done  in  [8],  for  example,  by  employing  a  continuous-time  model  where  each 
input  source  is  modeled  by  an  interrupted  Poisson  process.  The  investigation  that  follows 
considers  two  discrete-time  models  of  a  shared-buffer  switch  subjected  to  bursty  (and  hence 
correlated)  traffic.  They  support  exact  and  approximate  solution  algorithms,  respectively; 
moreover,  we  find  that  these  methods  can  be  usefully  integrated  to  achieve  both  greater 
accuracy  and  reduced  execution  time  (when  compared  with  exclusive  use  of  the  approximate 
method). 

The  first  method,  referred  to  as  Algorithm  1,  provides  an  exact  solution  of  the  steady-state 
distribution  of  shared-buffer  occupancy  for  switches  of  limited  (but  not  trivial)  size.  This 
solution  is  based  on  an  efficient  representation  of  the  state  space  that  derives  from  certain 
symmetries  implied  by  the  underlying  assumptions.  Although  its  application  is  limited  in  the 
sense  noted  above  (its  execution  time  grows  exponentially  with  buffer  capacity),  it  is 
nevertheless  very  useful.  In  particular,  in  addition  to  providing  exact  results  for  the 
probabilities  of  rare  cell-loss  events,  it  can  serve  as  a  reference  for  assessing  the  nature  and 
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magnitude  of  errors  that  result  from  approximate  analytic  models  and/or  solution  techniques. 
Although  simulation  is  often  used  for  this  purpose,  such  practice  is  reasonable  only  if  the 
simulation  results  are  themselves  highly  accurate  (high  confidence  with  respect  to  small 
confidence  intervals). 

Further,  as  we  emphasize  in  the  development  that  follows,  it  is  sometimes  possible  to 
integrate  the  use  of  exact  solutions  with  certain  types  of  approximation  techniques,  thereby 
extending  the  scope  of  the  former.  For  example,  if  the  approximation  algorithm  is  iterative  in 
nature  (as  in  the  case  of  convolution,  or  more  specifically,  the  algorithm  we  consider  below) 
then  an  exact  solution  can  be  usefully  employed  for  the  initial  iteration.  This  leads  to  more 
accurate  approximate  evaluations,  even  for  realistically  large  switches  with  bursty  traffic. 

The  approximation  technique  we  propose  is  new  ( Algorithm  2)  and  is  based  on  a 
decomposition  of  the  system  into  smaller  systems  involving  fewer  output  ports.  Comparisons 
(see  section  4)  of  algorithm-2  results  with  those  of  algorithm  1  (for  small  switch  sizes  and 
very  low  loss  requirements)  and  with  simulation  data  (for  larger  systems  with  relatively  high 
losses)  reveal  that  the  approximations  obtained  are  reasonably  accurate.  These  results  are  then 
used  to  estimate  the  advantage,  in  terms  of  memory  saving,  of  a  shared-buffer  architecture 
relative  to  a  simpler  architecture  that  employs  a  dedicated,  fixed-capacity  buffer  for  each 
output  port.  With  such  estimations,  the  required  shared-buffer  capacities  for  very  low 
admissible  cell  loss  probabilities  can  be  likewise  estimated. 

Assumptions  concerning  the  switch  and  its  traffic  are  discussed  in  section  2.  This  is 
followed  by  descriptions  of  the  two  algorithms,  including  their  integration  (section  3)  and,  in 
turn,  a  presentation  of  the  results  just  mentioned  (section  4).  Section  5  then  summarizes  what 
was  accomplished,  with  appendices  A  and  B  providing  some  solution  details  that  were 
omitted  in  section  3. 

2  THE  SWITCH 

The  switch  considered  has  a  typical  shared-buffer  architecture,  i.e.,  memory  space  available  to 
store  ATM  cells  is  dynamically  shared  among  all  the  output  queues.  Incoming  cells  arrive 
from  N  input  ports  and  are  addressed  to  one  of  R  output  ports.  Provided  there  is  available 
space  in  a  common  buffer  of  finite  capacity  K  (the  maximum  number  of  cells  that  can  be 
stored),  via  an  appropriate  pointer  structure  (maintained  in  a  separate  memory  space),  a  cell  is 
then  stored  in  a  logical  FIFO  output  queue  corresponding  to  its  address.  A  cell  is  lost  if  and 
only  if  no  buffer  space  is  available  when  the  cell  arrives.  The  switch  is  assumed  to  operate 
synchronously  at  the  cell  level;  in  a  given  time  slot  (the  time  required  to  completely 
transmit/receive  a  cell  on  a  port  of  the  switch),  we  presume  that  the  following  two  operations 
take  place  in  the  order  indicated. 

Send:  For  each  non-empty  logical  queue,  the  least  recently  arrived  cell  cell  is  served 
and  its  buffer  space  is  freed. 

Receive:  Each  incoming  cell  is  stored  in  the  buffer  (if  there  is  available  space)  and  the 
pointer  chain  is  appropriately  updated;  these  cells  will  be  served  in  the  next  slot. 

The  traffic  at  each  of  the  N  input  ports  is  represented  by  a  2-state  (On/Off)  Markov  chain 
where  these  individual  sources  are  assumed  to  be  statistically  independent.  In  the  On  state,  a 


252 


Part  Six  Models  of  ATM  Switches 


cell  arrives  with  probability  1  while  in  the  Off  state  there  are  no  arrivals.  The  dwell  times  in 
each  state  (number  of  time  slots  between  entry  and  exit)  are  geometrically  distributed 
variables,  with  means  L  and  /  for  the  On  and  the  Off  states,  respectively. 

The  activity  pin  of  an  individual  source  is  the  fraction  of  time  the  source  is  in  the  On  state 
and  is  given  by  p„,  =  L/(/+L).  The  destination  address  of  each  cell  (i.e.  the  output  port  it  is 
queued  to)  is  a  random  variable  that's  uniformly  distributed  over  the  R  output  ports  and  is 
independent  of  the  destinations  of  previously  arrived  cells.  This  assumption  attempts  to 
capture  the  situation  where  each  input  link  carries  the  superposition  of  a  large  number  of  low 
bit-rate  connections,  each  connection  addressed  to  a  possibly  different  output  link. 

As  is  well  known,  a  purely  random  (memoryless)  traffic  model,  where  each  input  behaves 
as  a  Bernoulli  source,  is  a  special  case  of  the  model  just  described.  Specifically,  the  above 
reduces  to  the  Bernoulli  case  if  L=  1/(1  -  pin)  and  /=  l/pm.  Finally,  we  let  p  denote  the 
offered  load,  as  reflected  by  the  utilization  of  an  output  port  (assuming  no  cell  losses),  i.e., 
p  =  N-pin  /  R. 


3  THE  ALGORITHMS 


As  mentioned  in  our  introductory  remarks,  we  choose  to  employ  both  exact  and  approximate 
model-based  methods  to  determine  the  steady-state  probability  distribution  of  shared-buffer 
occupancy,  given  the  switch/traffic  assumptions  stated  above.  (Other  measures,  such  as  loss 
probability  are  then  based  on  this  distribution.)  These  are  described  in  the  subsections  that 
follow,  with  some  of  the  mathematical  details  being  deferred  to  appendix  A  (algorithm  1)  and 
appendix  B  (algorithm  2).  However,  before  proceeding  with  these  descriptions,  it  is  helpful  to 
introduce  some  assumptions,  terminology,  and  notation  which  are  common  to  both 
algorithms. 

Time  is  assumed  to  be  discrete,  where  a  time  instant  t  takes  values  in  the  set 
T=  (0,  1,2,  ... }.  The  duration  between  successive  instants  t  and  t  +  1  is  interpreted  as  the  t  th 
time  slot,  where  the  enumeration  begins  with  time  slot  0  and  instant  t  represents  the  beginning 
of  slot  t,  i.e.,  it  occurs  before  the  intraslot  “send”  and  “receive”  operations  described  in  section 
2. 

M,  =  number  of  sources  in  the  On  state  during  slot  t 

X:J  =  number  of  cells  in  the  buffer  at  time  t  addressed  to  output  port  i,  i  =  1 ,  2,  . . .  ,  R 

Y,  =  ^  Xijt  =  number  of  cells  in  the  buffer  at  time  t. 

By  these  definitions,  along  with  our  earlier  assumptions  concerning  switch  dimensions  N 
and  K,  for  any  t  e  T,  these  variables  are  thus  constrained  to  have  integer  values  in  the  ranges 


0  <M,<N 

0  <XU<K;  i  =  1,2,  ...  ,R 
0<Y,<K 


(1) 

(2) 

(3) 


Since  there  is  an  arrival  at  an  input  port  if  and  only  if  the  source  for  that  port  is  in  the  On 
state,  M,  is  just  the  number  of  arrivals  during  slot  f,  including  some  that  may  be  lost  if  the 
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buffer  is  full.  However,  if  the  buffer  capacity  is  at  least  N  (which  we  tacitly  assume 
throughout  the  discussion)  then,  for  all  t  e  T, 

M,  <  Y,  (4) 

3.1  Algorithm  1  (Exact) 

Let  X,  =  (Xu,X2tl,...,XRl)  be  the  /^-dimensional  vector-valued  random  variable  that 
represents  the  cell-occupancy  of  the  shared  buffer  at  time  t.  If,  further,  we  let  X  denote  the 
corresponding  stochastic  process,  i.e.,  X  =  {  X,  1 1  e  T  }  then,  without  simplification,  the  state 
space  <2  of  X  quickly  becomes  computationally  intractable,  even  for  relatively  small  values  of 
R  and  K.  For  example,  if  R  =  8  and  K  =  40  then  \Q\  ~  4  •  108 .  (I  <2  I  is  the  cardinality  of  the 
state  space  Q.) 

A  key  observation  that  drastically  reduces  the  size  of  the  state  space  (while  still  supporting 
an  exact  solution)  is  the  following.  Due  to  assumptions  concerning  i)  the  identical 
probabilistic  nature  of  individual  input  sources  and  ii)  the  uniformity  of  cell  routing,  it  is 
possible  to  lump  (partition)  the  state  space  Q  according  to  the  following  equivalence  relation. 
Letting  qt  denote  the  number  of  cells  in  the  shared  buffer  that  are  destined  for  output  port  i 
(/=  1,  2,  two  states  q  =  (qx,q2,  ...  ,qR)  and  q' =  (q{ ,  q2  ,  ...  ,  qR')  are  equivalent  (and, 

hence,  in  the  same  lump)  if  and  only  if  q '  is  a  permutation  of  q.  Letting  Q  denote  the 
resulting  partition  of  Q ,  it  then  suffices  to  consider  the  corresponding  reduced  stochastic 
process  X  =  {X,\teT]  where,  for  all  t  e  T,  X,  is  the  equivalence  class  (lump)  that  contains 
state  X,.  For  various  choices  of  queue  capacity  K,  the  extent  of  this  reduction  is  indicated  in 
Tables  1  and  2,  where  the  number  of  output  ports  is  R  =  4  and  R  -  8,  respectively. 
Specifically,  these  tables  compare  the  size  of  the  original  state  space  Q  with  that  of  the 
reduced  space  Q  ,  where  we  see  that  reductions  of  several  orders  of  magnitude  are  possible. 


Table  1:  State-space  size  reduction  if  R  =  4. 


K  =  10 

K  = 

20 

K  = 

40 

K  = 

80 

\Q\ 

103 

104 

1.3 

■  105 

1.9 

106 

\Q\ 

94 

7.1  ■ 

102 

7.3 

•  103 

9.2 

104 

Table  2: 

State-space  size  reduction  if  R  - 

:  8. 

K=  10 

K  = 

20 

K  = 

40 

K  = 

80 

\Q\ 

4.3  •  103 

3.1 

106 

3.8 

108 

1A 

■  109 

101 

1.3  •  102 

2.0 

103 

7.3 

•  104 

8.1 

105 

To  obtain  a  feasible  means  of  determining  the  steady-state  probability  distribution  of  X , 
each  state  q  eQ  can  be  conveniently  represented  by  an  ordered  pair  (b,e)  consisting  of  a 
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sequence  of  “occupancy  values”  b  and  an  “occupancy  vector”  e  (see  appendix  A).  Using  this 
representation,  algorithm  1  is  based  on  a  functional  formulation  of  transitions  to  intermediate 
states  (during  a  slot)  that  result  from  the  “send”  operation  and,  in  turn,  each  intraslot  arrival. 
Beginning  with  state  X, ,  which  expresses  the  buffer  occupancy  at  the  end  of  slot  t,  and 
accounting  for  these  intraslot  transitions  during  slot  r+1,  the  resulting  state  (following  the  final 
cell  arrival)  is  then  the  next  state  X,+]  of  the  lumped  buffer  model.  (Again,  see  appendix  A  for 
further  details.)  Accordingly,  if  we  account  for  the  behavior  of  the  Markovian  source  model 
M  -  {  M,  1 1  e  T  }  then,  given  that  X,  =  (b,e)  and  M,-m  (the  number  of  On  sources  during  slot 
t),  these  functions,  together  with  the  transition  probabilities  of  M,  determine  the  conditional 
probabilities 

p[xi+,=(b\e'),M,+]  =m'\x,  =(b,e),M,  =m]  (5) 

for  all  (b',e')  e  Q  and  all  m'  e  {0,1,...,  A} .  In  other  words,  if  we  let  Z,  be  the  pair  of  variables 
( X, ,  M, )  and  consider  the  corresponding  stochastic  process  Z  =  {  Z,  1 1  e  T  )  then  the  transition 
probabilities  of  Z  at  time  t  are  given  by  (5).  Beginning  with  some  arbitrary  distribution  for  the 
initial  state  variable  Z0  =  ( A0 ,  M0  )  the  distribution  of  Z,+1  can  thus  be  determined  iteratively 
from  the  distribution  of  Z,  and  the  transition  probabilities  (5)  at  time  t,  for  t  =  0,1,2,...  until  a 
steady-state  (stationary)  condition  is  sufficiently  well  approximated.  More  precisely,  the 
computation  terminates  when,  for  all  {b,e)&Q  and  all  m  e  {0,1,...,  A}  the  absolute  value  of 
the  relative  difference 


P[Z,+ 1  =  ((b,e),m)]-  P[Z,  =((h,e),m)] 

P[Zt={{b,e),m)} 

is  less  than  some  very  small  positive  number.  Given  that  t  satisfies  this  condition,  the 
distribution  we  seek  is  then  obtained  by  summing  over  the  states  of  the  source  model  M,  i.e., 
for  all  ( b,e)e  Q  , 


_  N 

P[x,=(b,e)\  =  £/>[Z,  =((b,e),m)]. 

m= 0 


Although  application  of  this  algorithm  becomes  intractable  for  large  values  of  R  and  K,  as 
indicated  in  tables  1  and  2,  the  reduction  in  state  space  size  provided  by  the  reduced  model 
permits  feasible  solutions  for  switches  with  moderate  dimensions.  For  example,  when 
implemented  on  an  HP9000  series  700  workstation,  algorithm  1  can  accommodate  an  8x8 
switch  with  random  traffic  and  a  buffer  capacity  of  A=60  or  bursty  traffic  and  a  buffer 
capacity  of  /f=40.  Moreover,  and  as  noted  in  section  1,  exact  models  are  likewise  very  useful 
if  a  large  system  can  be  approximately  decomposed  into  smaller  subsystems  that  admit  to 
such  representation.  For  instance,  in  addition  to  more  obvious  uses  such  as  convolution,  this 
type  of  “divide  and  conquer”  approach  was  employed  by  the  approximate  method  developed 
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in  [3],  Details  as  to  how  algorithm  1  can  be  exploited  in  concert  with  algorithm  2  are 
presented  at  the  end  of  the  subsection  that  follows. 

3.2  Algorithm  2  (Approximate) 

This  algorithm  approximates  the  shared-buffer  occupancy  probabilities  via  an  iterative 
procedure  that  considers,  at  each  successive  step,  subsystems  of  growing  size.  Presuming  an 
NxR  switch  with  a  shared  buffer  of  capacity  A",  for  a  specified  integer  r,  where  1  <  r  < R/ 2,  we 
initially  choose  two  disjoint  subsets  1  (r)  and  2(r)  of  the  set  { 1,  2,  of  all  output  ports, 

where  each  subset  has  cardinality  r.  Although  just  how  these  subsets  are  chosen  is  relatively 
arbitrary,  to  simplify  the  discussion  we  assume  that  both  R  and  r  are  integer  powers  of  2. 
Moreover,  without  loss  of  generality,  we  can  let  l(r)  be  output  ports  1  through  r  and  2 (r)  be 
ports  r  +  1  through  2  r,  i.e, 


l(r)  =  {1,2,  ...,  r)  and  2(r)  =  {  r+1,  r+2,  ...,2r}. 

The  shared  buffer,  together  with  the  2 r  output  ports  l(r)  u  2(r),  will  be  referred  to  simply 
as  an  r-subsystem.  In  keeping  with  the  notation  of  the  exact  method,  the  buffer  state  at  a  given 
time  t  is  described  by  the  random  variables 

X,(r)J  -  number  of  cells  in  the  buffer  at  time  t  addressed  to  output  ports  in  i(r),  i  =1,2 

and,  to  represent  arrivals  and  departures  in  an  analogous  fashion,  we  let 

Wi(r)J  =  number  of  incoming  cells  at  time  t  addressed  to  ports  in  i(r),  f  =  1,2 

Zi(r)i,=  number  of  cells  that  depart  the  buffer  at  time  t  from  output  ports  in  i(r),  i=  1,2. 

Relative  to  this  model  of  an  r-subsystem,  and  recalling  that  M,  is  the  number  of  sources  in 
the  On  state  at  time  t,  let  us  now  consider  the  following  limiting  distributions  concerning 
arrivals  (Ar),  combined  buffer  occupancy  and  source  activity  (Bn  referred  to  as  the  "buffer- 
source"  distribution),  and  departures  ( Dr ). 


Ar(w\,w2\m)  =  limPfW|(r),,  =  w,,  W2(r),i  =  w2\M,  -  m] 

/-»  oo 

Br(x\,x2,m)  =  limP[X1(r),,  =  xu  X2(r),i  =  x2,M,  =  m\ 

t—>oo 

Dr(z\x,m)  —  lim  P[Zi(r)i,  =  z\Xi(rU  =  x,M,  =m\ 

f— >°° 

=  lim  P[Z2(rit  =  z\X2(r)'t  =x,M,  =  m\ ' 


(6) 

(7) 

(8) 


Computation  of  the  conditional  arrival  probabilities  Ar(w\,w2 1  m)  is  straightforward  since, 
by  the  uniform  routing  assumption, each  arriving  cell  has  a  probability  MR  of  being  addressed 
to  a  given  output  port.  Hence,  as  both  l(r)  and  2(r)  have  cardinality  r,  for  either  subset  i(r) 
(f=l,2),  the  probability  of  an  arrival  being  addressed  to  a  port  in  i(r)  is  simply  r/R.  With  this 
observation,  let  Blnp  denote  the  binomial  distribution  having  parameters  n  and  p,  i.e.,  for 
0  <  i  <  n, 
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( n\  . 

.  P'(l- />)"". 

w 


Then  in  case  all  arrivals  are  accepted  (the  “no-loss”  case),  the  formulation  of  Ar  is 
immediate,  i.e., 


Ar{w\,w2\m)=  BIm<2r/R(wi  +  W2)-fi/w,+w2,l/2(Wl).  (9) 

To  extend  (9)  so  that  it  can  account  for  cell  losses,  we  assume  further  that  there  is  no 
statistical  dependence  between  the  address  of  an  arriving  cell  and  the  event  that  it  is  one  the 
cells  discarded  among  the  m  that  arrive.  In  this  case,  the  extension  is  easily  obtained. 

The  departure  distribution  Dr,  on  the  other  hand,  is  more  difficult  to  determine  once  the 
value  of  r  is  greater  than  1 .  It  is  here  that  we  choose  to  introduce  an  approximate  computation 
based  on  the  following  recursive  formulation  of  Dr  in  terms  of  Dr/2  and  Br/2  (where  r>  1). 
(The  inexact  nature  of  this  formula  will  be  discussed  in  a  moment.) 


X  Z 

Dr(z\x,m )  -  IX  Dri2(z\\xum)Dri2(z- Zilx- xi,m)Er/2(xi\xi  +  x2,m)  (10) 

Xi=0z,  =0 

where  Er{x\  I  x\  +  x2,  m)  is  the  probability  that  xx  cells  in  the  buffer  are  addressed  to  output 
ports  in  l(r),  given  that  i)  x\  +x2  are  addressed  to  ports  in  l(r)  u  2(r)  and  ii)  m  sources  are 
active,  i.e., 


Er(x\\x\  +x2  ,m)  = 


Br(xi,x2,m) 
X Br(i,j,m ) 

i,j\i+j=X\  +x2 


(ii) 


The  knowledge  of  Ar  and  Dr  permits  the  formulation  of  the  transition  probabilities  for  any 
pair  of  states  in  the  model  determined  by  r,  i.e.,  the  model  that  represents  an  aggregation  of 
output  ports  according  to  sets  l(r)  and  2(r).  This,  in  turn,  permits  the  computation  of  the 
steady-state  buffer-source  distribution  Br,  using  an  iterative  method  similar  to  that  employed 
by  algorithm  1.  This  method  relies  on  both  Ar  and  Dr  in  the  sense  mentioned  above. 
Accordingly,  for  a  given  value  of  r  (i.e.,  a  given  iteration  of  algorithm  2),  the  calculations  of 
Ar  and  Dr  must  precede  that  of  Br.  Once  Br  is  computed,  if  2 r  =  R  then  the  computation 
terminates  since,  in  this  case,  the  r-subsystem  accounts  for  all  the  output  ports.  If  not,  the 
number  of  ports  considered  is  doubled  (i.e.,  r  is  replaced  by  2 r)  and  the  computations  are 
repeated  for  this  larger  r-subsystem;  in  particular,  the  new  values  for  Dr  are  obtained  using  the 
recursion  given  by  (10).  For  additional  details  concerning  the  computation  of  Br ,  please  see 
appendix  B  .  Accordingly,  algorithm  2  can  be  summarized  as  follows. 

^  f  1  if  jc  =  0  fl  if  jc  >  0 

Step  1:  r=  1.  £>!(0lx,m)  =  <  and  D^llx.m)  =  J 

[0  else  [0  else 
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Step  2:  Compute  Ar  (see  (9)). 

Step  3:  Compute  B,  (see  appendix  B). 

Step  4:  If  2 r  =  R,  exit;  otherwise  continue. 

Step  5:  r  <—  2r. 

Step  6:  Compute  Dr  (see  (10)). 

Step  7:  Go  to  Step  2. 

This  algorithm  yields  a  fairly  good  approximation  of  the  steady-state  occupancy 
probabilities  of  a  shared-buffer  switch  with  bursty  traffic.  The  two  principal  sources  of 
approximation  error  are  the  following. 

1.  In  solving  each  /--subsystem  model  (Step  3),  we  assume  that  the  storage  capacity  for  cells 
addressed  to  the  2 r  ports  in  l(r)  u  2(r)  coincides  with  the  buffer  capacity  K.  In  reality,  this 
capacity  is  shared  among  cells  destined  for  all  R  ports. 

2.  In  the  same  step,  we  assume  that  the  number  of  departures  from  ports  in  l(r)  is 
independent  of  the  number  of  departures  from  ports  in  2 (/-),  i.e.,  for  all  t  e  T,  the  random 
variables  Z1(r) ,  and  Z2(r),,  are  statistically  independent.  This  is  not  generally  true. 

The  number  of  main  iterations  of  this  algorithm  is  clearly  log2R.  However,  it's  important  to 
note  that  this  number  can  be  reduced  if  an  /--subsystem  can  be  solved  directly  for  a  value  of  r 
that  is  greater  than  1 .  This  can  be  done  by  applying  algorithm  1  to  a  special  Nxr  shared-buffer 
system,  where  sources  in  the  On  state  transmit  cells  with  probability  r/R.  Accordingly,  Step  1 
of  (modified)  algorithm  2  then  begins  at  a  value  r  >  1,  where  data  for  this  value  is  supplied  by 
algorithm  1.  This  combined  use  of  both  algorithms  (the  “integrated”  approach  referred  to  in 
the  title  and  introduction)  obviously  reduces  the  execution  time.  Moreover,  it  also  improves 
the  precision  of  the  results,  since  each  iteration  introduces  an  error  of  approximation.  Further 
discussion  of  the  nature  of  such  errors  is  deferred  to  the  end  of  section  that  follows. 


4  RESULTS 

Recalling  some  of  the  motivation  that  was  mentioned  at  the  outset  (see  section  1),  because  of 
the  severe  requirements  imposed  on  ATM  cell  loss  probabilities,  highly  accurate  results  (with 
high  levels  confidence)  are  difficult  to  obtain  by  simulation.  Although  there  has  been  some 
progress  in  the  development  of  fast  simulation  techniques  for  rare  events,  e.g.,  various  forms 
of  “importance  sampling”,  these  typically  rely  on  very  special  knowledge  of  the  system  in 
question.  If  approximate  analytic  methods  are  used  instead  then,  even  though  they  often 
provide  reasonably  accurate  results  in  the  higher  probability  region  of  buffer  occupancy,  they 
tend  to  be  much  less  accurate  for  large  occupancy  values.  In  other  words,  the  asymptotic 
behavior  of  the  distribution  (its  tail)  is  not  well  approximated.  However,  for  purposes  such  as 
determining  buffer  dimensions  that  insure  satisfactory  loss  performance  with  respect  to 
stringent  cell  loss  probability  requirements,  accurate  knowledge  of  this  tail  is  crucial. 

To  pursue  this  matter  in  terms  of  the  development  of  the  previous  section,  we  first  examine 
the  use  of  algorithm  1  as  it  applies  to  two  of  the  logical  output  queues  of  an  NxN  shared 
buffer.  Further,  we  suppose  that  the  capacity  of  this  buffer  is  large  enough  so  that,  effectively. 
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it  can  be  regarded  as  infinite.  This  situation  can  thus  be  represented  by  an  (exact)  model  of  an 
Nx2  shared-buffer  switch,  where  On  sources  transmit  cells  with  probability  2/N.  To  satisfy  the 
“effectively  infinite”  assumption,  we  take  the  buffer  capacity  K  to  be  large  enough  to  insure 
cell  loss  probabilities  that  are  less  than  1015.  Due  to  the  small  number  of  output  ports,  this 
model  can  be  efficiently  solved  using  algorithm  1 .  The  purpose  of  the  analysis  is  to  study  the 
correlation  among  the  occupancy  distributions  of  two  queues  in  an  NxN  system.  For  the 
random  (Bernoulli)  traffic  case,  the  joint  occupancy  distribution  of  the  two  queues  was 
obtained  in  both  [3]  (using  z-transforms)  and  [5],  The  analysis  here  is  similar  to  the  latter,  but 
is  extended  to  the  case  of  bursty  sources. 


Offered  Load 


Figure  1  Covariance  between  the  length  of  two  queues. 

Figure  1  displays  the  covariance  (between  two  queues)  as  a  function  of  offered  load,  for 
random  and  bursty  traffic  and  N  equal  to  16.  Among  other  things,  it  is  interesting  to  note  the 
sign  of  the  covariance.  For  the  random  traffic  case,  the  covariance  is  always  negative,  while  in 
the  case  of  bursty  traffic  it  is  positive  for  low  loads  and  becomes  negative  as  the  load 
increases. 

The  knowledge  of  the  joint  distribution  of  the  occupancy  probability  of  the  two  queues, 
together  with  the  assumption  of  a  effectively  infinite  buffer,  permits  the  variance  of  the 
buffer's  occupancy  distribution  to  be  formulated  as  follows.  Let  Y„  denote  the  random 
variable  whose  probability  distribution  is  the  limiting  distribution  of  Y,  as  t  — >  °°,  i.e.,  Yx  is  the 
steady-state  number  of  cells  in  the  shared  buffer.  Similarly,  let  Xu„  and  X2~  be  the  random 
variables  representing  the  steady-state  occupancy  of  queue  1  and  queue  2,  respectively,  in  the 
Nx2  system  described  above.  Then,  taking  the  subscripts  °°  to  be  implicit  (context  should 
suffice  to  convey  the  steady-state  interpretation),  the  mean  and  variance  of  Y=Y„  can  be 
formulated  as  follows  in  terms  of  X,=Xl  oo  and  X2  =  X2,~,  where  we  let  queue  1  be 
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representative  of  single-queue  behavior.  Note  that  since  X]  and  X2  are  identically  distributed, 
X2  could  likewise  serve  this  purpose. 

E[Y]  =  NE[Xl]  (12) 

Var(F)  =  N  ■  Var(X,)  +  N(N- 1)  •  Cov(X,,X2)  (13) 

Formula  (13)  can  also  be  used  to  determine  the  error  (with  regard  to  variance)  introduced 
by  assuming  that  the  queues  are  statistically  independent.  (The  latter  assumption  is  convenient 
since  it  permits  the  distribution  of  Y  to  be  obtained  via  the  N-fold  convolution  of  the 
distribution  of  a  single  queue.)  Assuming  such  independence,  Var(T)  =  TV-VarlX,)  and, 
accordingly,  the  error  due  to  this  assumption  is  given  by  8  =  N(N-l)Cov(XifC2).  Further, 
since  the  sign  of  8  is  clearly  the  sign  of  the  covariance,  an  approximation  based  on 
convolution  underestimates  the  variance  if  Cov(X1X2)  >  0.  Since  there  is  no  error  with  regard 
to  the  value  of  £[T],  we  can  reasonably  expect  that,  by  using  convolution,  the  results  are 
conservative  only  if  the  covariance  is  negative.  Also,  it  appears  that  this  approximation  gets 
worse  for  growing  values  of  ICov(X|,X2,)!  and  becomes  useless  when  this  value  approaches 
that  of  N  Var{X\).  This  is  borne  out  by  Figure  2,  which  compare  occupancy  distributions 
obtained  by  convolution  with  corresponding  (and  more  accurate)  results  determined  by 
simulation.  Specifically,  this  plot  demonstrate  that  convolution  provides  an  overly  optimistic 
estimate  of  the  distribution  of  Y  in  the  case  of  heavy,  bursty  traffic  and,  hence,  cannot  be 
relied  on  for  practical  applications. 


Figure  2:  Comparison  between  convolution  and  simulation. 

It  is  also  worth  noting  that  the  knowledge  of  mean  and  variance  of  the  shared-buffer 
occupancy  distribution  permits  another  approximation  of  loss  performance.  As  mentioned  in 
section  1,  for  the  case  of  a  shared  buffer  submitted  to  random  traffic,  [3]  has  proposed 
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approximating  the  occupancy  probability  of  an  (infinite)  shared  buffer  with  the  density  of  an 
appropriate  Gamma  distribution.  This  particular  distribution  was  considered  because  of  the 
exponential  asymptotic  behavior  of  its  density  function,  which  is  typical  of  many  queueing 
systems.  More  precisely,  by  computing  the  mean  and  variance  of  the  occupancy  distribution 
of  an  infinite  shared  buffer,  this  Gamma  distribution  is  chosen  such  that  its  first  two  moments 
match  the  computed  values.  The  loss  probability  of  a  AT-capacity  buffer  is  then  estimated  as 
the  probability  that  a  random  variable  with  this  Gamma  density  has  a  value  greater  than  K. 


Figure  3:  Comparison  between  exact  results  and  Gamma  function  approximation. 

Figure  3  displays  the  occupancy  distributions  of  a  4x4  switching  element,  with  K  =  50  and 
offered  loads  p  equal  to  0.2  and  0.8,  respectively.  The  Gamma  distribution's  density  function 
is  then  compared  with  the  distribution  obtained  from  algorithm  1,  indicating  that  the  Gamma 
density  provides  a  fairly  good  approximation  for  the  higher  probability  states.  On  the  other 
hand,  one  can  see  that  it  fails  to  capture  asymptotic  behavior  in  the  low  probability  region. 
Moreover,  we  see  that  the  estimation  errors  in  this  region  are  load-dependent,  with  values 
being  overestimated  in  case  p  =  0.2  and  underestimated  if  p  =  0.8. 

We  now  turn  to  the  analysis  that  utilizes  algorithm  2.  As  noted  earlier,  this  algorithm 
estimates  the  loss  probability  of  a  shared-buffer  system  with  bursty  traffic,  even  in  cases 
where  the  buffer  is  large.  To  validate  this  approach,  we  compare  the  results  obtained  by 
algorithm  2  with  those  obtained  by  simulation  (for  large  buffers  and  high  loads)  and  by 
algorithm  1  (for  smaller  buffers). 

Figures  4  and  5  plot  the  loss  probability  as  as  function  of  buffer  capacity  K  for  both  an  8x8 
(Figure  4)  and  a  16x16  (Figure  5)  switch.  In  each  case,  we  consider  random  traffic  and  three 
different  values  of  offered  load  (p  =  0.2,  0.7,  0.8). 
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Figure  4:  Comparison  between  exact  and  approximate  results;  random  traffic. 


Figure  5:  Comparison  between  exact  and  approximate  results;  random  traffic. 

Comparisons  of  the  model  result  (straigth  line)  with  simulation  data  and  with  exact  analytic 
results  (both  plotted  as  circles)  reveal  that  the  error  increases  as  the  capacity  K  gets  larger. 
However,  in  all  the  cases  considered,  the  error's  value  is  less  than  an  order  of  magnitude. 
Included  in  the  same  figures  are  some  plots  of  results  obtained  from  the  algorithm  described 
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in  [5]  (dashed  lines).  In  all  cases,  it  can  be  seen  that  algorithm  2  provides  a  more  accurate 
estimate  of  cell  loss  probability. 


Buffer  Capacity 


Figure  6:  Comparison  between  exact  and  approximate  results;  L  =  100. 


Buffer  Capacity 


Figure  7:  Comparison  between  exact  and  approximate  results;  L  =  100. 
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Figures  6  and  7  are  similar  to  Figures  4  and  5,  respectively,  except  that  we  now  consider 
traffic  sources  that  are  bursty.  Specifically,  the  mean  burst  length  of  a  source  is  L=100  and, 
again,  two  instances  of  offered  load  are  considered,  namely  p  =  0.2  and  p  =  0.8. 

To  illustrate  another  application  that  integrates  the  use  of  an  exact  method  (algorithm  1) 
with  an  approximate  formulation  (described  below),  we  estimate  the  advantages,  in  terms  of 
required  storage  capacity,  of  a  shared-buffer  architecture  as  compared  with  a  (simpler)  switch 
having  dedicated  output  queues  (no  sharing).  For  given  values  of  N,  R,  L,  and  pin  along  with 
an  admissible  (maximum  allowed)  cell  loss  probability  pa,  let  K  be  the  capacity  of  the  shared 
buffer  and  let  fC  be  the  capacity  of  each  of  the  output  buffers  in  the  dedicated  case.  Hence, 
RK  is  the  total  capacity  of  the  latter.  Further,  let  s  denote  the  fraction  of  this  capacity  that  is 
required  in  the  case  of  a  shared-memory  switch  (presuming  the  same  value  of  pa  for  each), 
i.e.,  5  =  KIRK.  Thus  the  value  of  s  (the  “relative  saving”)  can  vary  from  a  minimum  of  1 IR 
(corresponding  to  the  greatest  theoretical  reduction  in  required  memory)  to  a  maximum  of  1 
(no  reduction). 
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Figure  8:  Memory  saving  afforded  by  buffer  sharing. 

Figure  8  illustrates  how  this  relative  saving  varies  as  a  function  of  the  mean  burst  length  L 
for  different  choices  of  N  =  R  (4,  8).  The  activity  pjn  of  a  source  is  equal  to  0.8  and  two  loss 
probability  targets  are  considered.  As  indicated  by  the  figure,  we  have  the  following 
observations  with  respect  to  the  combinations  of  parameter  values  considered. 

a)  The  advantage  of  buffer  sharing  increases  (as  reflected  by  smaller  values  of  s)  as  the 
number  of  ports  gets  larger. 

b)  Likewise,  buffer  sharing  is  more  advantageous  as  the  loss  probability  target  becomes 
lower  (more  severe). 

For  very  low  loss  probability  targets  pa,  observation  b)  suggests  an  empirical  means  of 
obtaining  a  quick  and  conservative  estimate  of  the  required  capacity  K  of  a  shared-buffer 
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switch.  Let  K(pa),  and  s(pa)  be  the  the  values  of  K,  K,  and  5  that  result  from  a  particular 

choice  of  pa.  Then,  from  the  definition  of  s,  it  follows  that 

K(pf)  =  s(pf)RK(pf).  04) 

If  the  value  of  pa  is  relatively  high  (in  an  ATM  context,  pa  >  10'6)  then  K{pf)  can  be 
obtained  by  simulation.  As  for  K(pf),  algorithm  1  can  serve  to  determine  its  value  (letting 
R  -  1),  even  in  cases  where  pa  is  much  smaller  (as  we  exploit  below).  Then,  to  “scale  up 
these  calculations  to  a  more  severe  loss  requirement,  let  pa'  denote  a  lower  admissible  cell  loss 
probability,  i.e.,  pa'  <  pa.  Then  by  observation  b)  it  follows  that  s(p^)  <  s{pa)  and  hence, 
applying  (14),  we  have 

K(p’)  =  s(pf)RfC(pf)  <  s  (pf)RK(p*)  ^Ki.pfKipfVKipf.  (15) 

Given  the  value  of  K{pf),  which  again  can  be  accurately  determined  using  algorithm  1, 
(15)  thus  provides  a  conservative  estimate  of  shared-buffer  capacity  in  cases  where  the  target 
loss  probability  is  much  lower  (e.g.,  pa'  <  10-9),  namely 

KM)  =  K{pJKW)IK{pJ 

For  both  a  4x4  and  an  8x8  switch  {R  =  4,  8)  and  as  a  function  of  traffic  burstiness,  Figure  9 
illustrates  the  extent  to  which  Kesi(p^')  overestimates  the  actual  values. 


Figure  9:  Validation  of  the  empirical  dimensioning  rule. 
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Here,  the  original  and  modified  target  loss  probabilities  considered  are  pa  =  10"6  and 
Pa’  =  10~9,  respectively,  under  an  assumed  offered  load  of  p  =  0.8.  As  can  be  observed,  the 
relative  error  of  the  estimate  (i.e.,  the  quantity  (Kest  -  K)/K)  is  somewhat  smaller  for  R  -  4  as 
compared  with  R-  8.  For  both  switches,  this  error  increases  slowly  as  the  mean  burst  length  L 
becomes  larger.  In  particular,  for  an  8x8  switch  with  L=160,  the  relative  error  is 
approximately  0.2  (20%). 


5  SUMMARY 

By  employing  an  exact  solution  method  (algorithm  1)  along  with  a  new  approximate  method 
(algorithm  2),  we  have  shown  that  a  synergistic  use  of  both  can  be  beneficial  in  the  context  of 
shared-buffer  switch  evaluation.  Aside  from  the  usual  advantages  associated  with  comparing 
exact  vs.  approximate  results,  we  find  that  true  integration,  where  both  are  employed  for  a 
single  purpose,  can  likewise  be  very  useful.  Further,  we  have  shown  that  algorithm  2, 
particularly  if  used  in  concert  with  algorithm  1 ,  can  provide  reasonably  accurate  results  even 
for  realistically  large  switches  in  the  presence  of  a  bursty  traffic  environment.  Finally,  by 
examining  the  reduction  in  buffer  capacity,  relative  to  a  dedicated-buffer  architecture,  that 
results  from  buffer  sharing,  we  have  found  that  memory  saving  increases  as  the  target  loss 
probability  decreases.  In  turn,  this  suggested  an  empirical  means  of  conservatively 
dimensioning  a  shared-buffer  switch  such  that,  even  in  the  case  of  extremely  low  target 
probabilities,  the  capacity  so  determined  is  reasonably  close  to  what's  actually  required. 
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APPENDIX  A 

Recalling  notation  introduced  in  section  3.1,  we  let  Q  be  the  set  of  possible  (buffer 
occupancy)  states  of  the  R  logical  queues  of  a  shared  buffer  of  finite  capacity  K,  i.e.,  if  qt  is 
the  number  of  cells  stored  in  logical  queue  i  (0  <q,<K,  1  <  i<  R )  then 

Q  =  {(qi,  q2,  •••»  qR)  I  o  ^  qi  +  q2+  •••  +  q«  ^  K}- 

In  turn,  we  identify  states  that  are  permutations  of  one  another  via  an  equivalence  relation 
on  Q,  letting  Q  denote  its  corresponding  partition  (set  of  equivalence  classes).  As  noted  in 
section  3.1,  the  shared  buffer  can  then  be  represented  by  the  reduced  stochastic  process  X  - 
[X,  I  teT),  where  X,  is  the  equivalence  class  that  contains  state  X,. 

In  what  follows,  we  show  how  the  transition  structure  of  X  can  be  described,  in  part,  by  a 
convenient  representation  of  the  elements  of  Q  .  Specifically,  if  q  =  {qx,q2,...,qR)  e  Q ,  let  q 
denote  its  equivalence  class,  i.e.,  the  state  in  Q  that  contains  q.  Then  an  occupancy  value  for 
q  is  the  value  of  some  coordinate  g,  of  q.  If,  further,  we  let  b  =  {b\,b2,...,br)  be  a  listing,  in 
increasing  order,  of  all  the  different  occupancy  values  for  q  (where  1  <  r  <  R),  we  can  define 
the  occupancy  vector  of  q  to  be  the  r-tuple  e  =  (ehe2,...,er),  where,  is  the  number  of 
different  logical  queues  having  occupancy  value  bj(  1  <  e2  <  R).  Note  that,  by  the  definitions  of 
Q  and  q  ,  it  follows  that  of  both  b  and  e  are  invariant  relative  to  the  choice  of  a  representative 
state  q  e  q  .  It  is  also  easily  shown  that  if  q  and  q  '  are  distinct  states  in  Q  then  the 
corresponding  ordered  pairs  ( b,e )  and  (b',e')  are  likewise  distinct.  In  other  words,  this  pair  of  r- 
tuples  provides  a  unique  representation  of  a  state  in  Q  .  Moreover,  the  set  of  all  such 
representations  is  just  the  set  of  all  ordered  pairs  (b,e),  with  b  =  (bi,b2,...,br)  and  e  = 
(eue2,...,er),  such  that 

1  <r<R, 

0  <bi<  K, 
b\  <  b2  <. .  .<  br, 

X,,,eJ  =  R.and 


From  this  point  on,  a  state  of  the  process  X  will  be  identified  with  its  corresponding  pair 
(b,e),  where  the  latter  is  now  regarded  as  an  element  of  Q  .  In  these  terms,  the  transition 
structure  of  the  process  can  be  formulated  as  follows. 

Given  that  the  system  is  in  a  state  X,  =  ( b,e )  at  the  beginning  of  time  slot  t,  the  state  of  the 
system  after  departures  caused  by  the  “send”  operation  in  slot  t  (this  is  an  intermediate  state 
that  precedes  the  subsequent  arrivals;  hence,  it  is  not  an  explicit  part  of  the  behavior  of  A)  is 
given  by  the  function  d(b,e),  where 
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d{b,e)  —  • 


(( b\  —1,^2  — l),e) 

((b\,b1-l,...,br-l),e) 


f(0,b}-l,...,br-l),(e]+e2,e3,...,er)) 


if bx  >0 

if  b\  =0  and  b2  >  1 
if  b\  =  0  and  b2  =  1 


Suppose  now  that  ( b,e )  is  the  intermediate  state  so  determined  by  the  function  d.  The  next 
state  X,+i  is  then  a  function  of  the  arriving  cells  as  well  as  ( b,e ).  Assuming  that  each  cell  is 
randomly  addressed  to  one  of  the  output  queues,  the  probability  that  an  incoming  cell  is 
addressed  to  one  of  the  e}  logical  queues  (each  with  bj  cells)  is  clearly  e/R.  In  a  manner  similar 
to  how  d  is  defined,  a  function  a  (suggesting  “arrival”)  then  determines  an  intermediate  state 
(unless  it’s  the  last  arrival  during  slot  r)  that  results  from  an  entry  of  an  arriving  cell  to  one  of 
these  e,  queues.  Specifically,  the  function  a  (whose  arguments  are  the  intermediate  state  (b,e), 
along  with  the  value  j  that  identifies  the  queue  subset  containing  the  arrival's  destination 
queue)  can  be  expressed  as  follows. 


a((b,e)J)  = 


\(bi,...,bj,bj  +  l,bj+u ...,br),(e\ . ehX,ej  -1,1, ej+u...,er)) 

((b,,..  .,bj-u  bj  +l,bj+ . ,br),(ei,...,ej-iXej+u...,er)) 

( b,(e\,...,ej-\,ej  -l,eJ+i  +l,ej+2,...,er )) 

<fb\,...,bj-\,bj+\ . br),(e\,...,eh\,ej+\  +l,eJ+2,...,er)) 


if  bj+ 1  >  bj  + 1  and  ej  >  1 
if  bj+ 1  >  bj  + 1  and  ej  -  1 
if  bj+ 1  ~bj  + 1  and  ej  >  1 
if  bj+ 1  -  bj  + 1  and  ej  =  1 


As  described  in  section  3.1,  these  functions  then  serve  to  formulate  the  transition 
probabilities  of  the  composite  process  Z={(X, ,  Mt+1)  1 1  e  T ). 


APPENDIX  B 

The  key  step  of  the  approximate  method  proposed  in  section  3.2  consists  of  finding  the 
steady-state  distribution  Br(x]rx2,m),  given  the  distributions  Dr(z I  x,m)  and  Ar(wi,w2  I  m)  of  the 
departure  and  arrival  processes,  respectively.  Recalling  the  exact  meanings  of  each,  we  have 

Br{x \,x2,  m)  =  the  steady-state  probability  of  having  x,  cells  in  the  buffer  addressed  to 
output  ports  in  the  set  l(r)  =  { 1,  2,...,r},  x2  addressed  to  output  ports  in  the 
set  2 (r)  =  (r+1,  r+ 2,  ...,  2r},  and  m  sources  in  the  On  state. 

D,{z  I  x,m)  =  the  steady-state  probability  of  having  z  cell  departures  during  a  slot  from 
ports  in  l(r)  (alternatively  2(r)),  given  that  x  cells  are  addressed  to  these 
ports  and  m  sources  are  active. 

Ar{w\,  w2 1  m)  =  the  steady-state  probability  of  having  w,  cell  arrivals  during  a  slot 
addressed  to  ports  in  1  (r)  and  vv2  addressed  to  ports  in  2(r),  given  that  m 
sources  are  active. 

To  compute  Br(xu  x2,  m),  we  employ  an  iterative  algorithm  similar  to  the  one  used  for  the 
exact  model.  Let  Blr  (xh  x2,  m)  be  the  probability  distribution,  at  time  t,  which,  in  the  limit  (as 
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t  — »  °°),  yields  the  steady-distribution  Br(xu  x2,  m).  Further,  let  C'r  (x\,  x2,  m)  be  the  probability 
distribution  of  the  intermediate  state,  during  slot  t,  that  results  from  cell  departures  (but  is 
prior  to  slot  t  arrivals).  Then  it  can  be  shown  that,  for  all  t  e  T, 


min(x,+r,AO  min  (x2+r,K-i) 

C'r(x\,x2,m)  =  '£lB'r(i,j,m)Dr(i-x1\i,m)Dr(j-x2\j,m ) 

i=jci+min(jci,l)  ,/'=.*2+niin(;t2.1) 


Jt|  x2  N 

B'r+l(x[,x2,m)=  £  ^C'r(iJ,l)Ar(X]  -i,x2-j\m)S(m\l) 

/=max(jC| -m,0)  j=max(x2-m+X] -i,0)  1=0 


where  S(m\l)  is  the  (time-invariant)  probability  that  m  sources  are  active  in  a  slot,  given  that  / 
were  active  in  the  previous  slot.  As  earlier,  we  then  iteratively  compute  B'r(x ],  x2,  m)  for 
growing  t  until  we  reach  a  time  t  that  yields  a  sufficiently  close  approximation  of  the  steady- 
state  distribution  Br(x\OC2,m).  Again,  the  specific  criterion  for  termination  is  a  value  of  t  such 
that  the  maximum  relative  difference  between  the  probabilities  for  slots  t  and  f+1  is  less  than 
some  very  small  number. 
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Abstract 

This  paper  studies  the  nonblocking  switching  operation  of  3-stage  Clos  networks  in  the 
multirate  environment.  In  particular,  we  concentrate  on  the  strictly  nonblocking  mode  of 
operation  of  these  switching  networks.  Our  analysis  determines  bounds  for  the  minimum 
number  of  middle-stage  switches  required  for  strictly  nonblocking  operation.  Several  cases 
of  the  multirate  environment  are  considered,  including  discrete  and  continuous  bandwidth 
multirate  traffic.  We  survey  the  results  already  reported  in  the  literature  and  we  extend 
them  for  general  asymmetrical  Clos  networks.  We  also  generalize  them  for  the  case  in 
which  the  internal  links  have  higher  bandwidth  capabilities  than  the  input/output  ports. 
In  addition  to  these  extensions,  we  derive  a  more  general  result,  which  not  only  provides 
a  tight  bound  for  various  multirate  cases,  but  also  improves  an  already  existing  bound  for 
the  case  of  continuous  bandwidth  multirate  traffic. 


Keywords 

Asymmetrical  Clos  networks,  nonblocking  operation,  multirate  networks,  computer  com¬ 
munication  networks,  ATM  switches. 


1  INTRODUCTION 

The  theory  of  nonblocking  switching  was  originally  motivated  by  the  problem  of  designing 
switching  systems  capable  of  connecting  any  pair  of  idle  ports  under  arbitrary  traffic 
conditions. 

One  of  the  first  switching  fabrics  that  was  recognized  to  achieve  nonblocking  operation 
was  the  crossbar  switch  with  n  input/output  ports  and  n2  crosspoints,  but  at  a  prohibitive 
cost  for  large  systems.  In  1953,  Charles  Clos  (Clos, 1953)  wrote  a  famous  paper  introducing 
design  methodologies  for  switching  networks  capable  of  achieving  nonblocking  operation 
with  significantly  less  crosspoints.  This  milestone  work  was  the  foundation  of  the  theory 
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Figure  1  A  symmetric  3-stage  Clos  network. 


that  has  since  been  developed  by  Benes,  Pippenger  and  many  others,  e.g.  (Benes,1965), 
(Pippenger,1982),  (Cantor, 1971),  (Feldman  et  a/., 1986),  and  (Masson  et  al.,  1979). 

1.1  General  Description  of  Clos  Networks 

A  three-stage  Clos  network  consists  of  three  successive  stages  of  switching  elements  which 
are  interconnected  by  links.  In  a  symmetric  three-stage  network,  all  switching  elements 
in  a  stage  are  uniform  (see  Figure  1).  In  the  symmetric  Clos  network  of  Figure  1,  there 
are  r  switches  of  size  nxmin  the  first  stage,  m  switches  of  size  r  x  r  in  the  second,  and 
r  switches  of  size  mxnin  the  third.  This  network  thus  interconnects  n  •  r  input  ports 
with  n  ■  r  output  ports. 
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In  this  paper,  we  also  consider  the  more  general  class  of  asymmetrical  3-stage  Clos 
networks  (Varma  et  ai,  1993)  and  study  their  nonblocking  operation  in  the  Multirate 
environment. 

Figure  2  illustrates  the  general  class  of  asymmetrical  3-stage  Clos  networks  considered 
in  this  paper.  The  differences  from  the  symmetric  case  are  in  the  number  of  switches  per 
stage,  the  number  of  input/output  ports  per  switch  and  in  the  capacities  of  the  internal 
links.  In  the  asymmetrical  case,  all  these  quantities  may  have  different  values. 


1.2  Multirate  Networks 

In  the  early  years  of  the  exploration  of  switching  networks,  the  control  algorithms  to 
achieve  nonblocking  operation  were  simple.  The  method  of  Circuit  Switching  was  the 
dominant  control  operation,  according  to  which,  in  order  to  establish  a  connection  from 
point  A  to  point  B,  one  had  to  progressively  reserve  the  links  and  establish  the  entire 
path  from  A  to  B.  All  the  resources  in  that  path  were  allocated  to  the  connection  during 
its  lifetime,  independent  of  the  connection’s  bandwidth  requirements. 

In  this  Classic  Circuit  Switching  (CCS)  environment,  the  fundamental  assumption  is 
that  each  link  is  allocated  to  exactly  one  connection,  That  is,  no  link  can  accommodate 
more  than  one  connection  simultaneously. 

Since  those  times,  circuit  switching  methods  have  evolved  significantly.  Physical  links 
are  now  time-multiplexed  and  the  concepts  of  virtual  channels  and  packet  switching  have 
matured  and  have  been  applied  extensively. 

During  the  past  15  years,  the  trend  in  communication  systems  has  been  towards  serving 
applications  with  a  wide  variety  of  characteristics.  Such  systems  are  designed  to  support 
connections  with  arbitrary  data  rates,  ranging  from  a  few  bits  per  second  to  several 
hundreds  of  megabits  per  second,  e.g.  (Coudreuse  et  a/., 1987),  (Huang  et  ai,  1984),  and 
(Turner, 1988).  These  systems  usually  carry  information  in  multiplexed  format.  However, 
in  contrast  to  earlier  systems,  each  connection  can  utilize  an  arbitrary  fraction  of  the 
bandwidth  of  the  link  carrying  it.  Typically,  information  is  carried  through  a  link  by  sta¬ 
tistically  multiplexing  a  set  of  data-packet  sequences  belonging  to  a  number  of  services. 
The  general  class  of  switching  networks  that  can  support  services  with  arbitrary  band¬ 
width  requirements  are  called  multirate  networks.  Multirate  networks  can  be  considered 
as  the  infrastructure  to  support  the  Asynchronous  Transfer  Mode  (ATM)  protocol  for 
broadband  integrated  services  (B-ISDN). 

Both  constant  bit-rate  (CBR)  services  as  well  as  variable  bit-rate  (VBR)  services  can 
be  accommodated  in  the  multirate  switching  environment.  A  VBR  service  can  be  con¬ 
sidered  as  a  sequence  of  CBR  services,  each  of  which  has  a  fixed  bandwidth  requirement 
during  a  specific  time  interval.  This  bandwidth  requirement  is  the  maximum  bandwidth 
requirement  of  the  corresponding  VBR  service  throughout  the  same  time  interval.  Bursty 
traffic  can  be  modeled  in  a  similar  way  as  traffic  consisting  of  VBR  services.  Therefore, 
a  service  with  bursty  characteristics  may  be  connected,  disconnected,  or  rearranged  at 
different  time  intervals,  depending  on  its  bandwidth  demands. 

One  way  to  operate  multirate  networks  is  to  select,  for  each  incoming  connection,  a 
path  through  the  switching  system  to  be  used  by  all  packets  belonging  to  that  connection. 
The  path  selection  should  be  such  that  the  available  bandwidth  on  all  intermediate  links 
selected  is  enough  to  carry  the  connection  through.  In  order  to  guarantee  high-quality 
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of  service,  our  goal  should  be  to  design  switching  networks  with  as  small  a  blocking 
probability  as  possible. 

Multirate  switching  provides  the  framework  for  popular  trends  such  as  multimedia  and 
highly  diverse  service  demands,  as  well  as  for  state-of-the-art  standards  such  as  the  ATM 
(Asynchronous  Transfer  Mode).  Therefore,  it  is  very  important  to  study  and  explore 
the  properties  of  nonblocking  operation  of  multirate  switching  systems.  Such  a  study  is 
also  necessary  because  the  multirate  model  further  covers  packet-switching  techniques, 
which  cannot  be  analyzed  by  the  CCS  model.  The  analysis  of  traditional  circuit  switching 
methods  provides  us  with  the  tools  to  use  in  today’s  demanding  B-ISDN  *  environment. 

The  extension  to  multirate  switching  is  not  a  trivial  problem.  The  primary  difficulty 
arises  from  the  fact  that  the  bandwidth  requirements  of  the  various  services  are  not  fixed 
any  more,  but  instead  can  be  very  diverse.  More  sophisticated  algorithms  are  needed,  in 
order  to  allocate  the  required  bandwidth  for  each  service  request  appropriately,  without 
compromising  hardware  resources  or  the  quality  of  service.  These  complex  algorithms, 
along  with  the  variability  of  the  service  demands,  make  the  analysis  of  multirate  networks 
a  very  challenging  problem. 

We  consider  two  cases  of  the  multirate  environment  in  our  study:  (a)  the  discrete 
bandwidth  case,  and  (b)  the  continuous  bandwidth  case. 

Discrete  Bandwidth  case:  The  weight  of  all  connections  belongs  to  a  finite  set  { , ...,  b^ 
where  6*,  =  kbx,  k  =  1  Denote  b  —  bx  and  B  =  max^{6t}.  In  this  case,  in 

order  to  simplify  the  notation,  we  will  assume  that  1/6  is  an  integer,  although  the 
proofs  still  hold  (with  a  little  modification)  if  we  do  not  impose  this  restriction. 
Continuous  Bandwidth  case:  Assume  that  each  input  and  output  port  has  a  max¬ 
imum  capacity  of  /?,  and  each  internal  link  has  a  maximum  capacity  of  1,  (b,B,(3 
are  normalized  with  respect  to  the  internal  link  capacity).  Then,  the  weights  of  all 
connections  belong  to  the  closed  interval  [6.  B ],  where  0<b<B<(3<l. 

The  analysis  for  the  continuous  bandwidth  case  can  also  be  used  for  the  discrete  band¬ 
width  case,  since  the  discrete  bandwidth  case  is  a  special  case  of  the  continuous  band¬ 
width  case.  However,  the  discrete  bandwidth  case  is  more  restrictive  and,  therefore,  tighter 
bounds  can  be  derived  for  it  by  using  specialized  analysis.  Furthermore,  this  special  case 
arises  in  several  applications,  which  do  not  require  the  generality  of  the  continuous  band¬ 
width  environment.  For  these  reasons,  we  study  both  cases  separately. 

1.3  Nonblocking  Operation 

One  way  to  operate  switching  networks  is  to  select  for  each  incoming  connection  a  path 
through  the  switching  system  to  be  used  by  all  packets  belonging  to  that  connection. 
The  path  selection  should  be  such  that  the  available  bandwidth  on  all  intermediate  links 
selected  is  enough  to  carry  the  connection  through.  In  order  to  guarantee  high  quality 
of  service,  our  goal  should  be  to  design  switching  networks  with  as  small  a  blocking 
probability  as  possible.  Therefore,  it  is  very  important  to  study  and  explore  the  properties 
of  nonblocking  operation  of  multirate  switching  systems. 

There  are  more  ways  than  one  for  a  switch  to  achieve  nonblocking  operation,  based  on 


*  Broadband  Integrated  Services  Digital  Networks 


Operation  of  3-stage  Clos  switching  networks 


273 


Table  1  Overview  of  the  related  work. 


Type  Conditions  if/iff  Formula  Authors 

CCS  6  =  B  =  /3  =  1  iff  M  >  21  - 1  Clos 

DBW  f,  j  int.  iff  M  >2  ■ 

L-B 

l-B  +  b 

+  1  Chung  &  Ross 

CBW  b  <  0.5  if  M  >  2  ■  max^.B]  [^J^j 

+  1  Melen  &  Turner 

CBW  b  >  0,  B  €  (1  -  6,1]  iff  M>2-[i 

•  (1  —  1)  +  1  Chung  &  Ross 

CBW  b  =  0,  Be  (0,1)  iff  M  >  2  ■ 

t-T 

1  -B 

+  1  Chung  &  Ross 

CBW  f,}  int.  if  2  [jijfj  +  1  <  M‘  <  2  [iff 

+  1  Chung  &  Ross 

CCS:  Classic  Circuit  Switching,  DBW/CBW:  Discrete/Continuous  BandWidth 

the  way  the  incoming  connections  are  distributed  over  the  network’s  resources.  Each  con¬ 
nection  can  be  established  by  allocating  network  resources  (links  and  buffers)  in  a  number 
of  ways,  ranging  from  naive,  arbitrary-allocation  to  highly  sophisticated  and  complex  al¬ 
gorithms.  Methods  of  higher  complexity  usually  achieve  better  nonblocking  performance 
at  lower  cost. 


1.4  Earlier  Work 

Table  1.4  summarizes  the  results  that  have  been  reported  in  the  past  for  these  networks 
and  for  both  discrete  and  continuous  bandwidth  multirate  traffic.  The  formulas  included 
in  this  table  provide  bounds  for  the  minimum  number  of  middle-stage  switches  required 
for  strictly  nonblocking  (SNB)  operation. 

In  1953,  Charles  Clos  (Clos,  1953)  introduced  for  the  first  time  the  class  of  3-stage  Clos 
networks  and  determined  that  the  number  of  middle-stage  switches  A/,  that  are  necessary 
and  sufficient  for  strictly  nonblocking  operation,  must  be  at  least  21  —  1,  where  1  is  the 
number  of  input  ports  per  first-stage  switch.  His  result  applies  in  the  most  general  case 
of  the  classic  circuit  switching  (CCS)  environment,  i.e.,  for  b  =  B  —  f3  —  1.  The  proof  of 
this  result  relies  on  the  fact  that,  in  the  worst  case,  the  source  input  switch  can  have  up 
to  L  —  1  input  ports  occupied,  and  each  of  the  corresponding  connections  can  be  routed 
through  a  different  middle-stage  switch,  thus  requiring  1  -  1  such  switches.  For  the  same 
reason,  the  destination  output  switch  may  require  1  —  1  additional  (possibly  different) 
middle-stage  switches.  Therefore,  in  order  to  be  able  to  route  a  new  connection  we  need 

( 1  —  1 )  +  (1  —  1)4-1  =  21—1. 

In  1983,  Jajszczyk  ( Jajszczyk.1983)  derived  conditions  for  nonblocking  operation  in  the 
strict  sense  of  two  and  three  stage,  multiple  channel  networks. 

In  1989,  Melen  and  Turner  (Melen  et  al.,  1989)  introduced  an  elegant  model  for  intercon¬ 
nection  networks  that  also  considers  multirate  traffic.  Their  analysis  determined  a  lower 
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bound  for  the  continuous  bandwidth  multirate  environment  with  b  <  0.5,  (for  b  >  0.5 
we  have  the  CCS  environment  case).  In  the  multirate  environment,  the  main  difference  is 
that,  now,  more  than  one  connections  may  share  a  link’s  bandwidth.  Therefore,  the  study 
and  analysis  of  multirate  networks  involves  considering  the  available  bandwidth  instead 
of  available  links. 

In  1991,  Chung  and  Ross  (Chung  et  al.,  1991)  improved  on  the  bounds  derived  by  Melen 
and  Turner,  by  deriving  sufficient  and  necessary  conditions  for  various  cases  of  continuous 
bandwidth  multirate  environments,  as  well  as  for  one  case  of  discrete  bandwidth  multirate 
environment,  (see  Table  1.4). 

In  1994,  Collier  and  Curran  (Collier  et  al.,  1994)  generalized  Melen  and  Turner’s  condi¬ 
tions  for  three  stage,  multirate  networks,  including  asymmetrical  switch  configurations. 

Chung  and  Ross,  although  they  produced  tighter  bounds  and  conditions,  they  only 
considered  the  special  case,  where  the  i/o  ports  have  the  same  capacity  as  the  internal 
links  (i.e.,  equal  to  1).  Melen  and  Turner,  instead,  had  considered  the  more  general  case 
of  the  internal  links  having  a  different  (higher)  capacity  than  the  i/o  ports.  Our  work 
extends  the  results  of  Chung  and  Ross,  based  on  this  assumption  by  Melen  and  Turner. 
In  addition  to  this,  we  derive  more  results  for  cases  not  covered  so  far,  such  as  the  case 
for  B  <  1  —  6. 

The  general  methodology  for  deriving  these  results  is  by  considering  one  first-stage 
switch  (source)  and  one  third-stage  switch  (destination),  and  assume  the  worst  case  sce¬ 
nario,  as  outlined  above. 

The  rest  of  this  paper  is  organized  as  follows.  Section  2  introduces  the  necessary  notation 
and  definitions.  Section  3  presents  a  number  of  theorems  and  corollaries  which  generalize 
earlier  work  for  various  cases  of  the  multirate  environment. 


2  PRELIMINARIES 

In  this  section,  we  present  the  notation  that  will  be  used  and  referred  to  throughout  this 
paper.  We  also  define  the  strictly  nonblocking  mode  of  switching  operation. 

Figure  2  illustrates  the  general  class  of  asymmetrical  three-stage  Clos  networks  con¬ 
sidered  in  this  paper.  The  number  of  switching  elements  in  the  three  stages  are  F ,  M, 
and  G,  respectively.  We  will  use  the  notation  Si,,  S2_,,  S3*  to  refer  to  first-stage  switch 
i,  middle-stage  switch  j  and  third-stage  switch  k,  respectively,  for  i  =  1,...,F,  j  — 
1  ,...,M  and  k  =  1  The  interconnection  pattern  among  the  three  stages  is 

described  below. 

Stage  1:  Switching  element  Sl;  has  P,  input  ports.  It  is  connected  to  switching  element 
S2j  with  a  link  of  capacity  RtJ.  The  link  between  Si,  and  S2_,  is  denoted  as  LR,j. 
Stage  3:  Switching  element  S3^  has  Q k  output  ports.  It  is  connected  to  switching  el¬ 
ement  S2j  with  a  link  of  capacity  T]tk.  The  link  between  S2_,  and  S3*  is  denoted  as 

L2j,fc. 

The  capacities  P,0  and  TJtk  are  normalized  quantities  such  that: 

min  <  min  R,,j,  minP.jA  =  1. 

1  Vt.j  '  Vj.k  '  J 
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Stage  1 


Stage  2 


Stage  3 


Figure  2  A  general  asymmetrical  Clos  network. 


Each  input  or  output  link  has  a  (normalized)  capacity  of  p  <  1.  Therefore,  the  total 

capacity  of  all  the  input  ports  to  the  network  is  given  by  I  =  £!=f  PP-  Similarly, 
the  total  capacity  of  all  output  ports  of  the  network  (i.e.  outputs  of  the  third  stage)  is 
O  d=  YX=\  QkP-  The  total  link  capacity,  C,  from  the  first  stage  to  the  middle  stage  is 
given  by 

F  M 

c  = 

.=i  j=i 

Similarly,  the  total  link  capacity,  P,  between  the  middle  and  the  third  stages  is  given  by 
M  G 

V  =  E  E 

j= l  k=i 
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Therefore,  the  total  capacity  of  any  set  of  simultaneous  connections  in  the  network  cannot 
exceed 

min{J,C,  T>,  O}. 

2.1  Definition  of  the  Strictly  Nonblocking  Mode 

Before  we  define  the  strictly  nonblocking  mode  of  operation  of  3-stage  Clos  networks,  we 
will  introduce  the  required  notation  for  this  definition.  For  this  purpose,  we  will  consider 
the  more  general  multirate  environment,  since  the  classic  circuit  switching  (CCS)  defini¬ 
tions  can  then  be  obtained  as  special  cases.  Most  of  the  notation  presented  in  this  section 
is  taken  from  (Melen  et  a/., 1989). 

In  the  multirate  environment,  we  assume  that  each  connection  request  has  a  weight 
uj  €  [6,  B]  for  some  b  <  B.  A  multirate  network  is  said  to  operate  under  the  continuous 
bandwidth  assumption,  if  uj  can  take  any  real  value  in  the  interval  [6.  B],  If  uj  can  only  take 
a  few  discrete  values  in  [b,  B]  then  the  network  is  said  to  operate  in  the  discrete  bandwidth 
case.  A  route  of  weight  uj  is  a  sequence  of  links  forming  a  path  from  an  input  port  to  an 
output  port  such  that  a  bandwidth  of  uj  is  allocated  on  each  link  of  this  path.  A  route  of 
weight  uj  from  x  to  y  realizes  a  request  [x  — ►  y,uj\.  A  state  is  a  set  of  routes  that  satisfy 
all  the  following  conditions. 

1.  For  every  input  or  output  port  x.  the  sum  of  the  weights  of  the  routes  that  include  x 
is  at  most  /?. 

2.  For  every  link  Ll,^,  the  sum  of  the  weights  of  all  routes  that  use  Llt  J  is  at  most  RUJ. 

3.  For  every  link  T2Jik,  the  sum  of  the  weights  of  all  routes  that  use  L2Jtk  is  at  most  Thk. 

We  say  that  a  state  realizes  a  set  of  requests,  if  there  is  a  one-to-one  and  onto  mapping 
from  the  set  of  requests  to  the  set  of  routes  in  the  state.  Notice  that  the  utilization  of 
a  link  /  in  a  given  state  is  the  sum  of  the  weights  of  all  routes  that  include  /.  A  link  or 

switch  y  is  said  to  be  uj- accessible  in  a  given  state  from  an  input  port  x,  if  there  is  a  path 

from  x  to  y  such  that  the  weight  on  each  link  l  in  the  path  is  at  most  C;  -  uj,  where  C/  is 
the  capacity  of  the  link  l.  We  say  that  a  connection  request  [x  — ►  y,u;]  is  compatible  with 
a  state  s ,  if  the  weight  on  x  and  y  in  s  is  at  most  (3  —  uj. 

Based  on  the  above  definitions,  we  can  now  define  the  Strictly  Nonblocking  (SNB)  mode 
of  operation  as  follows. 

Strictly  Nonblocking  Operation  (SNB):  A  network  is  strictly  nonblocking,  if  for  ev¬ 
ery  state  s  and  for  every  connection  request  r  compatible  with  s,  there  exists  a  route 
realizing  r  in  s.  No  specific  control  algorithm  is  assumed  and  no  rearrangements  of 
existing  connections  may  be  performed. 

Although  there  are  three  more  nonblocking  modes  defined  for  3-stage  Clos  switching 
networks, ^  in  this  paper,  we  concentrate  on  the  analysis  of  the  strictly  nonblocking  mode 

Hvide-Sense  Nonblocking  (WSN),  Semi-Rearrangeably  Nonblocking  (SRN).  and  Rearrangeably  Non- 
blocking  (RNB)  mode.  Only  SRN  and  RNB  involve  rearrangements  of  existing  connections. 
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of  operation.  Since  the  SNB  mode  is  the  most  general  one,  all  the  results  derived  for  the 
SNB  mode  are  also  sufficient  conditions  for  the  other  three  nonblocking  modes. 

3  THEORETICAL  ANALYSIS  FOR  SNB  OPERATION 

All  the  theoretical  results  presented  in  this  section  refer  to  the  Strictly  Nonblocking  (SNB) 
mode  of  operation  of  3-stage  Clos  networks.  Recall  that  in  this  mode  connections  are 
established  and  disconnections  are  performed  without  any  particular  algorithm.  We  derive 
results  for  both  discrete  and  continuous  bandwidth  cases. 

Observe  that  if  a  given  interconnection  network  is  SNB  for  the  continuous  bandwidth 
case ,  it  is  also  SNB  for  the  discrete  bandwidth  case.  Studying  the  special  case  (i.e.,  the 
discrete  bandwidth  case)  separately  allows  us  to  obtain  better  bounds,  which  are  still  use¬ 
ful  in  applications  where  the  discrete  bandwidth  assumption  is  appropriate  and  adequate. 
Also,  note  that  the  CCS  case  corresponds  to  b  =  B  =  1  in  the  discrete  bandwidth  case. 

Throughout  this  section,  we  will  adopt  the  original  notation  used  by  Chung  &  Ross  (Chung 
et  a/., 1991). 

For  the  following  analysis,  assume  a  3-stage  Clos  network  with  N  input  and  N  output 
ports.  Each  first-stage  switch  has  L  inputs  M  outputs  and  each  third-stage  switch  has  M 
inputs  and  L  outputs.  Therefore,  we  have  a  j.  x  M  x  j-  Clos  switch  (see  Figure  3). 

In  the  multirate  environment,  recall  that  each  connection  request  has  a  weight  uj  €  [6,  B ] 
for  some  b  <  B.  Also,  each  input  and  output  port  has  a  maximum  capacity  of  (3,  and 
each  internal  link  has  a  maximum  capacity  of  1  (b,B,(3  are  normalized  with  respect  to 
the  internal  link  capacity). 

Let  M *  be  the  minimum  number  of  middle-stage  switches  for  the  3-stage  Clos  network 
to  be  strictly  nonblocking  (with  N  and  L  fixed).  It  is  well-known  from  (Benes,1965)  that 
2 L  —  1  middle-stage  switches  are  required  for  strictly  nonblocking  operation  in  the  CCS 
case.  The  following  theorems  determine  the  number  of  middle-stage  switches  required  for 
strictly  nonblocking  operation  in  the  multirate  environment,  both  in  the  discrete  and  the 
continuous  bandwidth  cases. 

3.1  Already  existing  results 

The  following  theorems  present  the  results  summarized  in  Table  1.4,  and  refer  to  symmet¬ 
ric  3-stage  Clos  networks.  All  this  work  is  done  by  Chung  &  Ross  (Chung  et  al.,  1991),  who 
have  improved  on  the  original  work  done  by  Melen  &  Turner  (Melen  et  al.,  1989).  Chung 
&  Ross  assumed  that  (3=1,  that  is,  the  input/output  ports  have  the  same  capacity  as 
the  internal  links.  In  the  next  subsections,  we  will  extend  their  results  to  the  case  where 
the  internal  links  can  be  faster  than  the  input  ports,  (i.e.,  (3  <  1).  We  will  also  generalize 
these  results  for  asymmetrical  Clos  networks. 

Discrete  Bandwidth  case 

Theorem  1  Assuming  each  connection  weight  is  a  multiple  of  b  and  1/6  is  integer,  then 
a  3-stage  Clos  network  is  strictly  nonblocking  if  and  only  if: 
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Stage-1  Stage-2  Stage-3 


Figure  3  A  symmetric  three-stage  Clos  network. 


Continuous  Bandwidth  case 

Theorem  2  For  b  >  0  and  B  6  (1  —  6,  1],  a  3-stage  Clos  network  is  strictly  nonblocking 
if  and  only  if: 


M  >  2  • 


•(L-l)  +  l 


(2) 


Theorem  3  For  6  =  0  and  B  €  (0, 1),  a  3-stage  Clos  network  is  strictly  nonblocking  if 
and  only  if: 


def 


lim  2  • 
£—0+ 


L  -  1 

.1  -  B  Ft. 


+  1 


1 2 

L  — 1 
l-B 

l  2- 

L-l 
l-B  J 

+  1 
-  1 


if  ^ 
l-B 


if 


L-B 

l-B 


not  integer , 
is  integer. 


M  >  M' 
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Theorem  4  For  B  -  k  h  and  k,  1/6  integers, 


2  • 


L-B 
A-  B  +  b. 


+  1  <  M*  <  2  • 


L-B 

.1-5. 


+  1 


(3) 


In  the  following  theorems,  we  will  extend  the  above  conditions  by  assuming  that  the 
internal  links  have  a  higher  capacity  than  the  input/output  ports.  In  other  words,  we 
incorporate  the  speedup  factor  of  the  internal  links  into  the  already  existing  formulas  by 
assuming  that  the  normalized  i/o  port  capacity,  /J,  is  in  general  less  than  or  equal  to  the 
internal  link  capacity  (which  is  normalized  to  1). 

At  this  point,  we  observe  that  there  is  no  tight  bound  for  the  continuous  bandwidth 
case,  when  B  €  [0, 1  —  6].  This  case  has  been  an  open  problem  so  far  (Chung  et  a/., 1991) 
and,  in  this  paper,  we  provide  a  solution  for  it  in  Theorem  9. 


3.2  Extensions  for  more  General  Multirate  Clos  Networks 


Assuming  that  the  maximum  i/o  port  capacity  is  (3  times  the  internal  link  capacity,  where 
0  >  6  >  B  >  0  >  1,  we  can  normalize  with  respect  to  the  i/o  port  capacity  and  apply 
into  the  formulas  derived  by  Chung  and  Ross: 

0>  A>^>i>I  — „  0  >  6'  >  B'  >  1 

-  0  ~  0  ~  ~  13  ~  ~  - 


Discrete  Bandwidth  case 

Theorem  5  Assuming  each  connection  weight  is  a  multiple  of  b  and  1/6  is  an  integer, 
then  a  3-st.age  Clos  network  is  strictly  nonblocking  if  and  only  if: 


M  >2 


0L-  B 

1-5  +  6 


+  1 


(4) 


when  0L  >1+6.  Otherwise,  M  >  1. 


Proof  of  Theorem  5 

Sufficiency 

Assume  an  incoming  connection  of  weight  u>.  Since  1/6  and  w/6  are  integers,  the  min¬ 
imum  capacity  of  an  internal  link  that  would  block  this  connection  would  be  1  —  u>  +  6. 
The  maximum  utilized  bandwidth  on  the  input  links  that  would  still  allow  the  incoming 
connection  to  be  established  is  0L—u>.  Therefore,  the  number  of  internal  links  that  can  be 
saturated  with  this  load  is  at  most  f(u)  =  [  j  •  For  f3L  >1  +  6,  f(w)  is  an  increasing 
function  of  u>  and,  therefore,  is  maximized  for  u i  =  B  in  the  interval  (6,  B}.  For  0L  <  1  +6, 
f(u>)  is  a  decreasing  function  of  w  and,  therefore,  is  maximized  for  w  =  6  in  the  interval 
[6,  B ].  With  similar  reasoning,  the  same  number  of  middle-stage  switches  can  be  blocked 
by  a  third-stage  switch.  Hence  we  need  at  least  twice  as  many  middle-stage  switches  plus 
one,  to  ensure  nonblocking  operation  in  the  worst  case.  I 

Necessity 

Consider  a  configuration  with  2(0L-B)/b  connections,  each  with  weight  6.  Half  of  them 
use  the  same  first-stage  switch  u ,  and  the  same  third-stage  switch  z,  and  they  contribute 
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a  weight  of  at  least  1  -  B  +  b  to  each  of  [(/3L  -  B)/{  1  -  B  +  b) J  middle-stage  switches. 
The  remaining  half  of  the  connections,  use  in  a  similar  way,  first-stage  switch  w  ^  u 
and  third-stage  switch  i>  ^  2.  The  two  sets  of  middle-stage  switches  can  be  disjoint  in 
the  worst  case.  An  incoming  connection  (u,v,B)  would  then  require  at  least  one  more 
middle-stage  switch  to  get  through. 

In  the  case  where  f3L  <  1  +  b,  the  entire  input  capacity  of  a  first-stage  switch  can  fit  in 
exactly  one  internal  link.  Hence,  M=l.  I 


Continuous  Bandwidth  case 

Theorem  6  For  b  >  0  and  B  G  (1  -  6,  f}\,  a  3-stage  Clos  network  is  strictly  nonblocking 
if  and  only  if: 


M  >  2  • 


■(L-  1)  +  1 


(5) 


Proof  of  Theorem  6 

Sufficiency 

Assume  a  3-stage  Clos  network  with  a  single  first-stage  and  third-stage  switch  with 
L  input  ports.  Without  loss  of  generality,  assume  that  the  first  input  port  has  a  weight 
<  1  —  u>,  that  is,  it  will  allow  an  incoming  connection  with  weight  u 0.  Let  J(u>,L)  be  the 
maximum  number  of  internal  links  that  have  a  weight  >  1  —  to.  First,  we  will  show  that 

J{w,  L)  <  \_j3/b\(L  —  1). 

Let  I?  be  a  configuration  of  connections.  For  each  connection  r  €  R,  let  ar  be  its 
weight.  Let  G;  be  the  set  of  connections  in  R  that  pass  through  the  Ith  input  port.  Thus, 
{Gi,  •  •  ■  ,  Gl}  is  a  partition  of  R.  Let,  also,  J  be  the  set  of  all  internal  links  with  weight 
>  1  —  ui.  We  observe  that  J  =  \  J\.  Let  Hj  be  the  set  of  connections  in  R  that  pass  through 


the  jth  link  in  J .  Then: 

^2  dr  <  /3  ~U) 

r€Gi 

(6) 

X)  ar  <  Pi  l  =  2,  ■  •  •  ,  L 

t -eG( 

(7) 

X  Qr  >  1  ~  j  €  J 

r€H, 

(3) 

ar  €  [6,  B],  r  €  R 

(9) 

Let  G  d=  Uf=2  Gi.  Then, 

1 H )  n  G  |  >  1 ,  j  €  J 

(10) 

otherwise,  3j  €  J ,  such  that  Hj  C.  G 1  =t>  UrtH,  ar  ^  SreG,  ar  <  P  ~  which 
contradicts  (8),  since  /?  <  1.  From  (10)  we  have, 

J<Yj\H]FG\  <  \G\ 

jeJ 


(11) 
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From  (7)  and  (9)  we  have  |G,|  <  [p/b\,l  =  2, . . . ,  L.  For  w 
L(/3  —  B)/b\ .  Hence, 


B,  (6)  implies  |Gi|  < 


K?|  =  £c,< 


1=2 


(X  -  1). 


(12) 


From  (11)  and  (12),  we  conclude  that  J(u,L)  <  [p/b\(L-l).  Therefore,  we  need  at  least 
one  more  than  twice  as  many  middle-stage  switches  for  nonblocking  operation.  I 

Necessity 

For  every  B  €  (1  -b,P],  a  single  connection  (even  of  weight  b)  is  enough  to  block 
a"  ’^ternal  link>  smce  in  the  worst  case,  b  +  B  >  1.  Consider  a  configuration  consisting 
°  [P/bj(L  —  1)  connections  of  weight  6,  with  exactly  one  connection  saturating  each 

Then'  a  new  connection  of  weight  B  will  be  blocked  unless  we  have  at  least 
■i[p/b\(L  -  1)  -f  1  middle-stage  switches.  ■ 


Theorem  7  For  6-0  and  B  6  (0,  (3),  [3  <  1,  a  3-stage  Clos  network  is  strictly  nonblock¬ 
ing  if  and  only  if: 


M  >  2  • 


PL-  1 
1  -  B 


+  1 


I3L  -  B 
1  -  B 


-  1 


(13) 


Proof  of  Theorem  7 

Let  M*  be  the  minimum  number  of  middle-stage  switches  for  the  3-stage  Clos  network 
to  be  strictly  nonblocking.  We  observe  that  the  case  of  the  continuous  bandwidth  for 
6  =  0  can  be  solved  by  considering  the  discrete  bandwidth  case  with  arbitrarily  small  6. 
Therefore,  from  theorem  5  and  the  observation  that 


lim 

0L-B 

'PL  -  B' 

6—0+ 

[l  -  B  —  6j 

1  -  B 

we  have 


M*  =  2  •  lim 

PL-  B 

+  1=2- 

'PL-  B' 

6— +0  + 

[l-  B-b\ 

1  -  B 

(3L-  1 
1  -  B 


+  1. 


(14) 


Theorem  8  For  B  =  k  ■  b  and  k,[3/b  integers, 


2  • 

PL-  B 

+  1  <  M* < 2 • 

’PL  -  B' 

1  —  73  +  6 

1  -  B 

(15) 


Proof  of  Theorem  8 

The  lower  bound  is  derived  from  theorem  5,  if  we  consider  the  discrete  bandwidth  case 
as  a  sub-case  of  the  continuous  bandwidth  case  ( necessary  condition). 
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Input-Port  Capacity  Map 


Internal-Link  Capacity  Map 


Figure  4  Illustration  to  explain  the  proof  of  Theorem  9  when  1/6,  5/6  and  /?/6  are 
integers. 


The  upper  bound  is  derived  from  the  observation  that  every  configuration  of  connections 
with  w  G  [6  >  0,5]  is  also  a  configuration  of  connections  with  w  €  [0,5].  Therefore,  the 
upper  bound  is  the  exact  bound  of  theorem  7  (sufficient  condition).  1 


The  following  theorem  gives  the  necessary  and  sufficient  condition  for  strictly  nonblock¬ 
ing  operation  of  3-stage  Clos  networks,  in  the  continuous-bandwidth  multirate  environ¬ 
ment.  for  a  special  case  which  has  been  identified  as  an  open  research  problem  by  Chung 
&  Ross  (Chung  et  al,  1991). 

Theorem  9  Assuming  that  b  >  0,  B  <  1,  and  either  B  >  1-6  or  all  1/6,  B/b  and  f/b  are 
integers,  then  a  3-stage  Clos  network  is  strictly  nonblocking  in  the  continuous-bandwidth 
multirate  environment  if  and  only  if 


M  >  2  ■  ip  +  1 


where 

(  0,  i/5  >1-6, 

^  ~  |  i  otherwise 


Proof  of  Theorem  9 

Let  us  first  consider  the  case  when  5  >  1  -  6.  This  implies 


(16) 


(17) 


/?  -  5  <  1  -  5  <  6 

and  the  theorem  becomes  equivalent  to  theorem  6. 

Let  us  now  consider  the  case  when  all  1/6,  5/6  and  0/b  are  integers  (refer  to  Figure  4). 
First,  we  will  show  a  constructed  configuration  that  achieves  the  bound  suggested  by 
the  theorem  (necessity) ,  and  then  we  will  show  that  any  other  configuration  cannot  exceed 
this  bound  (sufficiency).  Therefore,  the  bound  is  exact. 
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Necessity: 

Consider  a  configuration  such  that  all  connections  have  weights  either  b  or  6  +  e,  where  t 
is  some  arbitrarily  small  positive  number.  Let  us  call  these  connections  SmallBlock  and 
BlGBLOCK,  respectively. 

First,  we  saturate  a  number  of  internal  links  (IL-bars)  by  routing  through  each  of  them 
one  BlGBLOCK  and  ( — y  — )  SmallBlock(s).  Then,  we  “tile”  the  input  port  capacity 
map  as  shown  in  Figure  4.  Due  to  the  fact  that  an  integer  number  of  BigBlocks  does 
not  fit  exactly  in  an  IP-bar,  there  is  some  unused  capacity  in  the  input  ports,  (besides  the 
space  allocated  on  the  first,  input  port  for  the  next  connection).  Since  we  loose  exactly  one 
SmallBlock  for  each  IP-bar  that  contains  at  least  one  BlGBLOCK,  the  total  number 
of  SMALLBLOCKs  that  we  lose  from  a  perfect  tiling  is  given  by 


V5 

btp 

f-1 

P-b 

(18) 


The  total  capacity  used  in  the  input  ports  (i.e.,  the  tiled  area)  is  equal  to  the  total  capacity 
used  in  the  internal  links.  Therefore,  the  maximum  number  of  saturated  links  is: 


PL-  B-  bA 

n-Dl^l  +  M-A 

v7  - 

1  -  B 

14*1 

(19) 


.Hence,  the  condition  M  >  2p  +  1  is  necessary. 

Sufficiency: 

Let  us  now  consider  some  configuration  7Z  of  arbitrary-weight  connections,  with  total 
used  input  capacity  Ur.  Let  also  Uc  be  the  total  used  input  capacity  of  the  constructed 
configuration,  C.  above.  Then,  Uc  =  {PL  —  B  —  bA)/(\  —  B). 

Obviously,  if  Ur  <  Uc ,  then  1Z  does  not  require  more  than  2<p  + 1  middle-stage  switches 
for  strictly  nonblocking  operation. 

If  Ur  >  Uc ,  then  the  only  input  ports  that  can  be  further  loaded  in  7 Z  are  the  ones 
that  contain  at  least  one  BlGBLOCK  in  C.  Therefore,  the  difference  Ur  -Uc  >  0  can  be 
distributed  (in  the  worst  case)  by  simply  augmenting  the  BigBlocks  by  an  extra  weight 
w  G  [0,  min{6,  B  -  b}).  Since  all  saturated  links  in  C  can  augment  their  BigBlocks  by 
any  weight  less  than  B,  we  conclude  that  1Z  will  not  require  more  middle-stage  switches 
than  C  for  strictly  nonblocking  operation.  ■ 

One  method  to  solve  the  co-dependent  equations  (16)  and  (17)  for  <p  is  as  follows: 

1.  Set  A  —  0  in  (16)  and  compute  an  upper  bound  for  <p. 

2.  Compute  A  from  (17). 

3.  If  both  equations  are  not  satisfied  by  the  computed  values  of  A  and  <p  let  ip  :=  <p  -  1 
and  goto  step  2. 

The  above  method  eventually  converges  to  the  solution  within  a  small  number  of  iter¬ 
ations,  since  (i)  a  solution  always  exists  by  construction  (see  proof),  and  (ii)  the  value  of 
A  is  usually  small. 
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In  the  special  case,  where  all  1/6,  B /b  and  3/b  are  integers,  Equation  (16)  of  Theorem  9 
becomes 


M  >2-ip  +  l 


L  ■  3  -  B  -  b  ■  A 
1  -  B 


+  1 


(20) 


The  following  example,  presented  in  Chung  k  Ross  (Chung  et  al..  1991),  illustrates  such 
a  case. 


Example  1  Consider  a  symmetric  3-stage  Clos  network  with  L  =  5  links  per  input 
switch,  b  =  0.1,  B  =  0.8  and  /?  =  1 .  For  this  case,  Theorem  8  bounds  the  minimum  number 
of  middle-stage  switches  required  for  strictly  nonblocking  operation  between  29  and  41. 
Theorem  9  gives  M*  =  39,  which  verifies  the  result  reported  by  Chung  k  Ross  (Chung  et 
a/., 1991).  I 

Corollary  1  For  every  b  >  0,B  <  1,  the  bound  established  in  Theorem  9  is  a  lower 
bound  for  the  minimum  number  of  middle-stage  switches  required  for  strictly  nonblocking 
operation. 

The  proof  follows  from  the  fact  that  the  cases  of  Theorem  9  are  subcases  of  the  corollary. 

3.3  Asymmetrical  Clos  Networks 

The  following  theorems  extend  the  results  presented  in  Section  3.2  for  asymmetrical  3- 
stage  Clos  networks. 

Let  R ,  min;  and  7/  =f  min;  T]tk. 

Discrete  Bandwidth  case 

Theorem  10  Assuming  each  connection  weight  is  a  multiple  of  b  and  Rj/b,Tk/b  are 
integers,  then  an  asymmetrical  3-stage  Clos  network  is  strictly  nonblocking,  if: 


M  >  max 

3  ■  Pi -B 

+  max 

3-Qk-B 

i 

R,  —  B  +  b  ^ 

k 

[Tk-B  +  b\ 

when  3  ■  P,  >  1  +  6,  and  3  '  Qk>1  +  b.  Otherwise,  M  >  1. 

Proof  of  Theorem  10 

In  the  more  general  case  of  an  asymmetrical  Clos  network,  each  first-stage  switch  i  has  P, 
input  ports  and  each  third-stage  switch  k  has  Qk  output  ports.  In  this  case,  the  condition 
of  Theorem  5  must  hold  for  every  first-stage  and  third-stage  switch.  Therefore,  we  can 
apply  a  similar  proof  as  in  Theorem  5.  Let  Lx  =  max{P,}  for  the  first-stage  switches  and 
Li  =  maxlQ*}  for  the  third-stage  switches.  Now,  assume  a  symmetric  switch  consisting 
of  the  worst  case  combination  of  one  input  switch  with  Lx  input  ports,  one  output  switch 
with  Li  output  ports  and  internal  links  with  capacities  R,  and  Tk  between  the  three 
stages,  respectively.  To  simplify  our  results,  we  used  R,  d=  min;  and  Tk  d=  min_,  T^k 
in  the  denominator.  This  simplifying  assumption  yields  the  sufficient  condition  given  in 
the  theorem.  ■ 
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Continuous  Bandwidth  case 

Theorem  11  For  b  >  0  and  B  €  (1  -  b,/3],  an  asymmetrical  3-stage  Clos  network  is 
strictly  nonblocking ,  if: 


M  > 


a 

b 


■  (max{P,}  +  max{Qi}  -  2)  +  1 

«  k 


(22) 


Theorem  12  For  6  —  0  and  B  6  (0,/?),  an  asymmetrical  3-stage  Clos  network  is  strictly 
nonblocking,  if: 


M  >  max 

0  •  Pi  —  B 

+  max 

' P-Qk-B ] 

i 

Rt  —  B 

k 

Tk-B 

-  1 


(23) 


Theorem  13  For  B  =  k  b  and  k,0/b  integers,  in  an  asymmetrical  3-st.age  Clos  network: 

0-  Pi- B  0-Qk-  B 

max  R - R~TT  +  m,ax  T  p  ,  .  +  1  <  Af*  < 

1  [Hi  —  B  +  6J  k  [ Tk-B  +  b 

0  ■  P,  —  B~\  f 0-Qk-B] 

max  — — - —  +  max  - - -  -  1  foa1) 

■  R,-B  k  Tk~  B  {  ’ 

The  proofs  of  Theorems  11-13  are  similar  to  the  proofs  of  Theorems  6-8  in  combination 
with  the  observations  of  the  proof  of  Theorem  10.  I 


4  CONCLUDING  REMARKS 

In  this  paper  we  studied  the  strictly  nonblocking  switching  operation  of  3-stage  Clos 
networks  in  the  multirate  environment. 

First,  we  generalized  the  results  of  Table  1.4  for  the  case,  when  the  internal  links  have 
higher  bandwidth  capabilities  than  the  input/output  ports,  by  introducing  the  internal- 
link  speedup  factor  ( 1/0 ).  Subsequently,  we  extended  these  results  for  general  asymmet¬ 
rical  Clos  networks.  In  addition  to  these  extensions,  we  contributed  a  general  tight  bound 
for  the  case  of  multirate  traffic,  where  1/6,  B/b,0/b  are  all  integers,  and  B  is  independent 
of  6.  This  formula  also  generates  earlier  reported  bounds  as  special  cases. 
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Abstract 

The  principle  of  Maximum  Entropy  (ME)  and  the  notion  of  system  decomposition  are 
combined  towards  the  creation  of  an  iterative  cost-effective  approximation  algorithm  for  the 
performance  analysis  of  packet-switched  buffered  Banyan  Multistage  Interconnection  Network 
(MIN)  based  Asynchronous  Transfer  Mode  (ATM)  switch  architectures  with  arbitrary  buffer 
sizes,  multiple  input/output  ports  and  Repetitive  Service  (RS)  internal  blocking. 

Traffic  entering  and  flowing  in  the  MIN  is  assumed  to  be  bursty  and  it  is  modelled  by  a 
Compound  Poisson  Process  (CPP)  with  geometrically  distributed  bulk  sizes  and  Generalised 
Exponential  (GE)  interarrival  times.  The  GE  distribution  is  also  adopted  to  represent  the 
random  nature  of  the  effective  service  times  of  packets  due  to  the  combined  effects  of  traffic 
burstiness  and  RS  blocking. 

Entropy  maximisation  implies  decomposition  of  the  Banyan  network  into  individual  building 
block  queues  of  switching  elements,  represented  by  shared  buffer  cross  bars,  under  revised 
GE-type  interarrival  and  service  times.  Each  building  block  queue  is  analysed  in  isolation  by 
applying  ME  techniques  and  classical  queueing  theory,  subject  to  marginal  mean  value 
constraints,  in  order  to  obtain  a  product  form  solution  for  the  joint  queue  length  distribution 
and  typical  performance  metrics  of  the  network. 

Numerical  results  are  included  to  validate  the  credibility  of  the  ME  approximation  against 
simulation,  define  experimental  performance  bounds  and  perform  a  buffer  capacity 
optimisation  across  the  entire  network. 


'  Supported  by  the  Engineering  and  Physical  Sciences  Research  Council  (EPSRC),  UK,  under  grant 
GR/K/67809. 
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Keywords 

Multistage  Iterconnection  Network  (MIN),  Banyan  network,  Queueing  Network  Model 
(QNM),  Repetitive-Service  (RS)  blocking  mechanism,  Maximum  Entropy  (ME)  Principle, 
Compound  Poisson  Process  (CPP),  Generalised  Exponential  (GE)  distribution.  Asynchronous 
Transfer  Mode(ATM)  switch  architectures. 

1  INTRODUCTION 

During  the  past  decade,  a  considerable  amount  of  effort  has  been  made  towards  the  design  and 
development  of  Asynchronous  Transfer  Mode  (ATM)  switch  architectures,  which  are  widely 
considered  as  the  preferred  packet-oriented  solution  of  a  new  generation  of  high  speed 
communication  systems,  both  for  broadband  public  information  highways  and  for  local  and 
wide  area  private  networks  (e  g.,  Tobagi  [25]). 

Amongst  the  many  types  of  ATM  switch  architectures,  of  particular  interest  are  the  so 
called  space  division  switches  which  are  primarily  based  on  Multistage  Interconnection 
Networks  (MINs)  (e  g.,  [1,2,19]).  Such  switches  are  composed  of  smaller  switching  elements 
represented  by  shared-buffer  crossbars.  Main  features  of  a  MIN  include  non-centralised 
switching  control  and  multiple  concurrent  paths  in  tandem  from  input  ports  to  output  ports. 

MINs  are  also  widely  employed  in  parallel  processing  systems  as  a  means  for  processor  - 
memory  (and  interprocessor)  communication.  The  nature  of  traffic  in  ATM  switches, 
however,  is  quite  different  from  that  observed  in  typical  parallel  machines  in  the  sense  that, 
regarding  the  latter,  there  is  basically  only  one  type  of  service,  namely,  high  speed  data  (not 
considering  “probe”  and  “acknowledgment”  signals  observed  in  inter-stage  transmissions), 
whereas  for  the  former,  there  exists  a  greater  variety  of  integrated  services  including  voice,  low 
and  high  speed  data,  teleconferencing,  TV  distribution  and  video  on  demand,  all  of  which  share 
the  same  communication  medium  with  different  cell  loss  and  delay  requirements. 

The  integration  of  such  ATM  services  implies  considerable  variability  in  terms  of 
transmission  speed  and  holding  times.  Moreover,  the  flow  of  cells  through  one  switching 
element  may  be  momentarily  blocked  (halted)  if  the  downstream  switching  element  has  reached 
its  buffer  capacity.  Thus,  credible  analytical  tools  are  essential  for  the  cost-effective 
performance  modelling  prediction  of  such  complex  ATM  switches. 

An  increasing  number  of  earlier  papers  concerning  with  the  performance  modelling  and 
analysis  of  MINs  have  appeared  in  the  literature  (e  g.,  [4-6,  8,  20,  24,  26])  and  such  trend  is 
likely  to  continue  towards  the  design  and  development  of  more  appropriate  ATM  space 
division  architectures.  In  this  context,  analytic  performance  models  of  shared  buffer  ATM 
switch  architectures,  based  on  both  continuous-time  and  discrete-time  queueing  models,  have 
received  particular  attention  Pinto  and  Harrison  [4,  5]  proposed  approximate  algorithms  for 
the  analysis  continuous-time  asynchronous  buffered  Banyan  networks  with  2x2  switching 
elements  using  Exponential  interarrival  times  and  2-phase  Coxian  (C,)and  Generalised 
Exponential  (GE)  service  time  distributions,  respectively,  with  Blocking  After  Service  (BAS) 
(i.e.,  service  is  suspended  at  the  output  port  for  a  cell  which  attempts  to  enter  a  destination 
switching  element  with  a  full  buffer).  Hong  et  al  [6]  and  Yamashita  et  al  [26]  described 
approximate  algorithms  for  the  performance  evaluation  of  discrete-time  and  continuous-time 
queueing  models  of  shared  buffer  ATM  switches  under  both  Interrupted  Bernoulli  and 
Interrupted  Poisson  arrival  processes,  respectively.  In  terms  of  computational  implementation. 
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these  works  tackle  the  problem  by  either  solving  global  balance  equations  numerically  [4  5]  or 
by  decomposing  the  switch  into  several  subsystems,  each  of  which  being  analysed  numerically 
in  isolation  [6,  26]  However,  as  the  number  of  input  (or  output)  ports  increases,  so  does  the 
size  of  the  system  s  state  space,  and  therefore,  further  approximations  are  required  in  order  to 
achieve,  if  at  all  possible,  tractable  solutions.  Thus,  there  is  a  great  need  to  apply  alternative 
methodologies  leading  to  both  accurate  and  cost-effective  approximations  for  the  performance 
modelling  and  evaluation  of  MIN-based  shared  buffer  ATM  switches. 

The  principle  of  Maximum  Entropy  (ME),  a  probability  inference  method  (c.f,  Jaynes  [7], 
Shore  and  Johnson  [22]),  has  been  used  successfully,  in  conjunction  with  queueing  theoretic 
mean  value  constraints,  for  the  approximate  analysis  of  both  continuous  time  and  discrete  time 
arbitrary  Queueing  Network  Models  (QNMs)  with  single  general  queues  of  finite  or  infinite 
capacity  (e  g,  [10-17]).  In  particular,  the  principle  has  been  utilised  in  the  study  of  general 
multibuffered  and  shared  buffer  queues  and  closed  form  expressions  in  both  continuous-time 
and  discrete-time  domains  have  been  obtained  for  Queue  Length  Distributions  (QLD),  Cell 
Loss  Probabilities  (CLP)  and  mean  delays  [14,  15],  More  recently,  a  new  product  from 
approximation  has  been  established  by  Kouvatsos  and  Wilkinson  [17],  towards  the  cost- 
effective  performance  analysis  of  arbitrary  open  discrete-time  QNMs  of  shared  buffer  queues 
with  cell  loss.  In  the  afor  mentioned  studies  the  arrival  process  at  each  queue  has  been 
assumed  to  be  highly  variable  and  was  modelled  by  Compound  Poisson  (CPP)  or  Bernoulli 
(CBP)  processes,  both  with  geometrically  distributed  bulk  sizes.  In  this  context,  the  burstiness 
of  the  arrival  process  is  characterised  by  the  squared  coefficient  of  variation  (SCV)  of  the 
interarrival  times  or,  equivalently,  the  average  size  of  the  incoming  bulk.  The  CPP  and  CBP 
arrival  processes  imply  GE  and  Generalised  Geometric  (GGeo)  interarrival-time  distributions, 
respectively,  whose  pseudo-memoryless  properties  facilitate  the  analysis  of  complex  queues 
and  networks  (e  g,  [11,  13,  16]).  The  choice  of  GE  and  GGeo  distributions  has  been  further 
motivated  by  the  fact  that  measurements  of  actual  traffic  or  service  times  are  generally  limited 
and  so  only  few  parameters  can  be  computed  reliably.  Typically,  only  the  mean  and  variance 
can  be  relied  upon.  In  this  case,  the  choice  of  distributions  which  imply  least  bias  (c.f,  [7]) 
(i.e,  introduction  of  arbitrary  and,  therefore,  false  assumptions)  is  that  of  a  GE  or  GGeo 
distribution  within  a  continuous-time  or  a  discrete  time  context,  respectively. 

In  this  paper  queueing  network  modelling  and  entropy  maximisation  are  employed  towards 
the  performance  analysis  of  Banyan  MINs  with  GE-type  external  traffic  pattern  and  stage-to- 
stage  transmission  times,  arbitrary  switching  element  sizes  (RxR,  R  >  2)  and  buffer  capacities, 
K,  under  Repetitive-Service  (RS)  (or  communication)  internal  blocking.  Such  MINs  provide 
full  connectivity  between  a  set  of  input  sources  and  a  set  of  destination  nodes.  In  a  Broadband 
Integrated  Services  Digital  Network  (B-ISDN)  environment.  Banyan  MINs  can  support 
several  different  types  of  traffic  concurrently  (e.g,  data,  voice,  video).  Consequently,  traffic 
models  must  be  able  to  capture  various  flow  characteristics  such  as  burstiness  (e  g.  video 
traffic  which  has  to  be  batched).  In  this  context,  the  GE  distribution  is  adopted  to  represent  (in 
an  appropriate  fashion)  the  random  nature  of  the  interarrival  times  and  effective  service  times 
of  packets  in  the  MIN  due  to  the  combined  inflence  of  traffic  burstiness  and  RS  blocking. 
Note  that  in  tandem  configurations  RS  blocking  occurs  when  a  cell  upon  service  completion  at 
queue  k  attempts  to  join  a  downstream  queue  £  whose  buffer  capacity  is  full.  Consequently, 
the  cell  is  rejected  by  queue  f,  and  immediately  receives  another  service  at  queue  k  .  This  is 
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repeated  until  the  cell  completes  service  at  queue  k  at  the  moment  where  the  destination 
queue  t  is  not  full. 

Entropy  maximisation  implies  a  decomposition  of  the  Banyan  network  into  individual 
multiple  input  GE-type  shared  buffer  queues  of  switching  elements  with  revised  (effective) 
interarrival  and  transmission  times.  These  queues  are  solved  in  isolation  and  together  with  GE- 
type  formulae  for  the  first  two  moments  of  the  cell  interdeparture  and  aggregated  arrival 
processes  at  each  output  port  queue,  play  the  role  of  cost  effective  building  blocks  towards  the 

performance  analysis  of  the  entire  network. 

The  ME  formalism  is  introduced  in  Section  2.  The  GE-type  distribution  is  described  in 
Section  3.  An  ME  QLD  of  a  multiple  input  shared  buffer  building  block  queue  is  outlined  in 
Section  4.  An  ME  product  form  approximation  for  a  arbitrary  QNM  of  a  buffered  Banyan 
MIN  together  with  a  description  of  the  traffic  flow  through  the  switching  elements  are 
presented  in  Section  5  ME  Analysis  of  three  types  of  switching  elements,  acting  as  building 
blocks,  together  with  appropriate  GE  flow  formulae  are  presented  in  Section  6.  Section  7 
presents  the  ME  approximation  algorithm  for  the  performance  analysis  of  arbitrary  size  Banyan 
networks.  Numerical  results  and  concluding  comments  follow  in  Sections  8  and  9, 
respectively. 

2  MAXIMUM  ENTROPY  FORMALISM 

Consider  a  system  Q  which  has  a  set  S  of  possible  discrete  states  {S0,S1,S2,...}  which  may  be 
finite  or  countably  infinite  and  state  Sn,  n  =  0,1,2,...  may  be  specified  arbitrarily.  Suppose 
that  the  available  information  about  Q  places  a  number  of  constraints  on  p(Sn),  the  probability 
distribution  that  the  system  Q  is  in  state  Sn  .  Without  loss  of  generality,  it  is  assumed  that  these 
constraints  take  the  form  of  mean  values  of  suitable  functions  {fj  (Sn ),  f2  (S  n ,  fm  (Sn ) } , 
where  m  is  less  than  the  number  of  possible  states.  The  principle  of  maximum  entropy  [7] 
states  that,  of  all  distributions  which  satisfy  the  constraints,  the  minimally  biased  distribution  is 
the  one  which  maximises  the  system's  entropy  function 


hq?)  =  - 

SneS 


(2.1) 


subject  to  the  constraints 


2>s„)=i.  <22> 

Sn  eS 

£f) k(S„)p(S„)  =  (fk),  k  =  \,2,...,m,  (2.3) 

Sn  eS 

where  {(fk).k  =  1,2, . . . ,  m}  are  the  prescribed  mean  values  defined  on  the  set  of  m  functions 
(f k(S„):k  =  1,2,..., m) ,  where  m  is  less  thatn  the  number  of  states  in  S.  The  maximisation  of 
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(2.1),  subject  to  the  constraints  (2.2)  and  (2.3),  can  be  carried  out  using  Lagrange's  method  of 
undermined  multipliers  and  leads  to  the  solution 


(2.4) 


where {J3k.  k  =  1,2, ...m) ,  are  the  Lagrangian  multipliers  determined  from  the  set  of  constraints 
(2.3)  and  Z,  known  in  statistical  physics  as  the  “partition  function”,  is  given  by 


(2.5) 


where  {/?0}  is  the  Lagrangian  multiplier  determined  by  the  normalisation  constraint  (2.2). 

Jaynes  [7]  has  shown  that,  if  the  prior  information  includes  all  constraints  actually  operative 
during  a  random  experiment,  the  distribution  predicted  by  the  maximum  entropy  can  be 
realised  in  overwhelmingly  more  ways  than  by  any  other  distribution  The  principle  of 
maximum  entropy  has  also  been  shown  by  Shore  and  Johnson  [22]  to  provide  a  “uniquely 
correct  self-consistent  method  of  inference”  for  estimating  probability  distributions  based  on 
the  available  information. 

Maximum  entropy  formalism  can  be  applied  in  the  performance  analysis  of  queueing  systems 
because  expected  values  of  various  distributions  of  interest  are  usually  known  in  terms  of 
moments  of  the  interarrival  and  service  time  distributions.  A  review  of  entropy  maximisation 
for  approximate  analysis  of  queueing  systems  and  networks  can  be  seen  in  Kouvatsos  [16]. 


3  THE  GE  DISTRIBUTION 


The  GE  distribution  is  of  the  form 


F(t)  =  P(X<t)  =  \-re  *,t20 


(3.1) 


where 


r  =  2  /  (C2  + 1),  cr  =  rv, 

2  . 

X  is  a  mixed-time  random  variable  (rv)  of  the  interevent-time,  while  1/v  is  the  mean  and  C  “  is 
the  SCV  of  rv  X  (see  Figure  1). 


r  =  2  /  (C2  + 1), 
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Figure  1  The  GE(v,C2)  distribution  with  parameters  x  and  o. 

For  C  >  1 ,  the  GE  model  is  a  mixed-time  probability  distribution  and  it  can  be  interpreted 
as  either 


1.  an  extremal  case  of  the  family  of  two-phase  exponential  (M)  distributions  (e.g., 

Hyperexponential-2  (H2  ))  having  the  same  v  and  C  ,  where  one  of  the  two  phases  has 
zero  service  time,  or 

2.  a  bulk  type  distribution  with  an  underlying  counting  process  equivalent  to  a  Compound 
Poisson  Process  (CPP)  with  parameter  2v  /  C2  + 1  and  geometrically  distributed  bulk  sizes 
with  mean  =  (C2  + 1)  /  2  and  SCV  =  (C2  -  1)  /  (C2  + 1)  given  by 


^(Ncp  =  n)  =  < 


if HZ'yi'-T-1. 


if  n  >  1, 
if  n  =  0, 


(3.2) 


where  Ncp  is  a  Compound  Poisson  rv  of  the  number  of  events  per  unit  time  corresponding  to  a 
stationary  GE-type  interevent  rv. 

The  GE  distribution  is  versatile,  possessing  pseudo-memoryless  properties  which  make  the 
solution  of  many  GE-type  queueing  systems  and  networks  analytically  tractable  (e  g., 
Kouvatsos  [16]).  Moreover,  it  has  been  experimentally  established  that  the  GE  model,  due  to 
its  extremal  nature,  defines  pessimistic  performance  bounds  on  typical  performance  measures 
over  corresponding  estimates  based  on  two-phase  distributions  having  the  same  first  two 
moments  as  the  GE.  The  GE  distribution  is  completely  characterised  in  terms  of  mean  rate  v 

and,  SCV,  C  and  it  can  be  interpreted  as  an  ME  solution  (c.f.,  Jaynes  [7]),  subject  to  the 
constraints  of  normalisation,  discrete-time  zero  probability  and  expected  value.  In  this  sense,  it 
can  be  viewed  as  the  least  biased  distribution  estimate,  given  the  available  information  in  terms 
of  the  constraints. 

For  C  <  1 ,  the  GE  distributional  model  (with  F(0)<1)  cannot  be  physically  interpreted  as  a 
stochastic  model.  However,  it  can  be  meaningfully  considered  as  a  pseudo-distribution 
function  of  a  flow  model  approximation  of  an  underlying  stochastic  model  in  which  negative 
branching  pseudo-probabilities  (or  weights)  are  permitted.  To  this  end,  all  analytical  GE-type 
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exact  and  approximate  results  obtained  for  queueing  systems  and  networks  when  C2  <  1  can 

also  be  used  -  by  analogy  -  as  useful  heuristic  approximations  when  C2  <  1  as  long  as  they 
satisfy  basic  queueing  theoretic  constraints  (c.f.  [16]).  Note  that  utility  of  other  improper  two- 

phase  type  distributions  (with  C“  <  1 )  in  the  field  of  systems  modelling  has  been  proposed  by 
various  authors  (e  g.,  Nojo  and  Watanabe  [21],  Sauer  [23]). 

4  ME  ANALYSIS  OF  A  SHARED  BUFFER  QUEUE 

Consider  a  general  queueing  model  of  a  shared  buffer  switching  element  with  bursty  arrivals, 
depicted  in  Figure  2.  The  queueing  model  consists  of  R  parallel  single  server  queues,  where  R 
is  the  number  of  output  ports.  Each  server  represents  an  output  port  and  each  queue 
corresponds  to  the  address  queue  for  the  output  port.  There  are  RxR  bursty  and  heterogeneous 
GE-type  interarrival  streams  of  cells,  R  (multiple)  streams  to  each  of  R  input  ports.  Each 

stream  has  a  mean  overall  arrival  rate,  Ajj  of  cells  and  a  SCV  of  interarrival  time,  Ca2, ,  for 

stream  (j,i),  z,y=l,2,...,R  (n.b.,  subscript  j  is  dropped  in  the  case  of  a  single  stream  per  input 
port).  Similarly,  the  transmission  (or  service)  time  of  a  cell  at  queue  i  follows  a  GE  distribution 

with  mean  rate  /z,,  and  SCV  Csf ,  for  stream  z,  z'=l,2,...,R.  Let  K  be  the  size  of  the  total 
shared  buffer.  A  cell  is  lost  if  it  arrives  at  a  time  when  there  is  a  total  of  K  cells  in  the  R  queues. 
Without  loss  of  generality,  it  is  assumed  that  any  of  the  R  queues  may  attain  the  maximum  size 
K. 


Figure  2  The  SRxR(GER  /  GE  III  K)  queueing  model  of  a  shared  buffer  switch. 
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The  queueing  model  of  the  shared  buffer  switching  element  is  denoted  by 
SRxR(GER  /  GE  / 1  /  K) ,  such  that 

1.  The  overall  interarrival  times  and  service  times  at  an  RxR  shared  buffer  queue  are 
heterogeneous  and  GE  distributed, 

2.  Each  output  port  has  a  single  server, 

3.  The  total  shared  buffer  capacity  of  the  switch  is  K. 

Moreover,  let  the  state  of  the  system  at  any  given  time  be  represented  by  a  vector 
n  s=  (n1,n2---,nR),  where  «,is  the  number  of  cells  in  queue  i,i=l,2,..  ,R,  and 


n 


Also  let  p(n),  n  e  S(K,  R),  be  the  joint  state  probability  distribution. 

Note  that  the  ME  solution  of  the  SRxR(GER  /GE/1)/K  queueing  system,  p(n)  is  of  the 
same  form  as  the  ME  solution  of  an  SRxR(GE/ GE/ 1)/ K  queueing  system  with  a  single 
(merged)  arrival  stream  at  each  of  the  R  input  points  (c.f.  Kouvatsos[14]),  subject  to  a 
common  set  of  mean  value  constraints.  Both  solutions  are  presented  below. 

4.1  An  ME  Solution  for  the  SRxR(GE/GE/l)/K  Queueing  System:  an  Outline 

The  form  of  the  ME  solution  of  an  Sr^G  /  G  / 1)  /  K  queueing  system,  subject  to 
normalisation  and  the  constraints:  server  utilisation,  Uj,  0<t//<l;  MQL  Lj,  Uj<Li<K ; 
conditional  aggregate  probability  (pi  of  a  full  buffer  subject  to  0  <  <pt  <  1 >  0 , 
i  =  1,2,..., R, is  given  by  the  method  of  Largrange’s  undetermined  multipliers  as  (c.f.  (2.4)) 


i= 1 


(4.1) 


where  Z  is  the  normalising  constant 


iieS(K,R)  i=l 


s,(n)  and  f, ( n )  are  auxiliary  (indicator)  functions  defined  by 


{ 


1,  >  0, 

0,  otherwise, 
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R 


fi(n)  = 


l,  Y,nj  =  KAsf(n)  =  l, 
j=i 

0,  otherwise. 


and  :  1  -  1,2,...,  R }  are  the  GE-type  Largrangian  coefficients  corresponding  to  the 

constraints  /  =  1,2,..., R},  respectively. 

Lagrangian  coefficients  /  =  1,2,...,  Rj  are  obtained  by  making  asymptotic 

connections  with  the  ME  solution  of  a  stable  GE/GE/1  queue  (c.f.,[l  1]),  namely 


xi<\-PiY 


_  Lt  -  pi 
Li 


Pi  =  Ai  /  Mi>  i  =  1,2,..., R, 


(4.2) 


where  Lt 


„  f  ,  2/-.  T\ 

_  A  |  _|_  ^-Ctj  +  /?/  C  V/ 

2  l  1- A  2 


1,2,...,R, 


(i.e.,  gf  and  xf  are  assumed  to  be  invariant  to  the  buffer  size  K). 

Moreover,  Lagrangian  coefficients  {yT:  z'=l,2,...,R}  can  be  computed  by 

1 .  Focusing  on  the  flow  balance  equations 

M1  “  A')  =  Mi^i ,  i  =  1,2,...,  R,  (4.3) 

where  7t,  is  the  cell  loss  probability  for  an  attempted  arrival  to  the  output  port  queue  i, 

2.  Deriving  recursive  expressions  for  nj  and  Uj,  1,2, ...,R,  and 

3  Solving  numerically  the  resultant  non-linear  simultaneous  equations,  (n.b.,  for  R=2,  these 
equations  can  be  solved  analytically  -  see  formulae  (4. 18)). 

The  normalising  constant  can  be  determined  by  applying  the  generating  function  approach 
and  can  be  computed  recursively  by  [14] 

z=  Zq(v)  +  C2(K),  (4.4) 

v=0 


where  {Q(v):  v- 0,1,...,K-1}  and  {C2(K)}  are  determined  via  the  following  recursive 
formulae: 


C\(v)  =  ClR(v),  v=0,l,...,K-l, 
C2(K)  =  C2*(K), 
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where 


Or  ( y)  =  Ckr_  i  ( v')  —  (1  —  5/iV.  KQV._[(  v  - 1)  +  xrCkr  ( v  - 1 ) , 

i=l- 
W,,  *=2, 

for  £  =  1,2,  r  =  2,...,R ,  v  =  1,2,..., K-2+&, ,  with  initial  conditions 


C*i(v') 


Qt(°)  =  i 

for  k  — 1,2 


v  =  0, 

v  =  1,2, . . . ,  N-2+k, 
r  =  2,...,R, 


Similarly,  the  utilisation  Uj  can  be  expressed  as 


C/,-=- 

Z 


2Q(,)(v)  +  C^(K) 

Vv  =  l  > 


*  =  1,2,...,R, 


(4.5) 


where 


C«(  v)  =  (1  -  Bkii  )x,C«(v  -  1)  +  (v  - 1),  v  =  2,...,K-2  +  k, 

k  =  1,2,  z  =  1,2,,..,  R,  with  initial  conditions  C['\  1)  =  Bk  ixi . 

The  marginal  state  probabilities  {p,(f,):f,- =  0,.,,K}  can  be  determined  by  using  ME 
solution  (4.1)  and  the  recursive  expressions  for  C[0(v).  Let  n(i)  be  the  random  variable  for 
the  number  of  cells  at  queue  z,  i  =  1,2,...,  R  Then  the  marginal  state  probabilities  are  given  by 
(c.f.  [14]). 


Pz(A-)  =  Pr[»(z)  >  f,]  -  Pr[zz(z)  >  lt  + 1], 


(4.6) 


where 

Pr[«(0  ^  f;] 


re,-i(K-i  \ 

L±y~  2ci(,)(v-^+1)  +  C«(K-^.+1) 
Vv=£,  ) 


i  =  \,2,...,R,  £{  =  1,2, . . . , K  . 
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Finally  the  aggregate  state  probabilities  {p (n):n  =  0, K}  are  given  by 
'  1 

z  n  =  0 ’ 

p(n)  =  \  —  Cx{n\  n  =  1,2,...,K  - 1, 


(47) 


C2](«),  n  =  K. 


4.2  An  ME  Solution  for  the  SRxR(GER  /GE/1)/K  Queueing  System:  An 
Extension 


Earlier  applications  of  entropy  maximisation  (e.g.,  [10,  12,  13,17])  on  arbitrary  QNMs  and 
shared  buffer  queues  imply  a  decomposition  into  individual  queueing  systems  with  revised  GE 
or  GGeo-type  interarrival  and  service  time  processes.  These  processes  utilise  analytic 
functions  describing  GE  or  GGeo-type  flows  amongst  the  queues  of  the  network.  Flows  are 
split  when  going  to  different  destinations  and  merged  when  converging  from  different  sources. 
The  formulae  used  to  split  flows  are  exact  in  the  case  of  random  routing.  For  GE  or  GGeo 
merging  flows  a  two  moment  matching  function  is  used  to  approximate  the  resultant  stream 
with  a  GE  or  GGeo-type  stream  This  last  operation  may  lead  to  some  inaccuracies  in  extremal 
cases,  where  there  are  large  differences  in  the  size  of  the  SC  Vs  of  the  merging  flows. 

fn  this  work,  a  ME  QLD  is  proposed  for  an  SRxR(GER  /GE/1)/K  queueing  system 
which  employs  multiple  input  streams.  This  ME  solution  is  of  the  same  form  as  (4.1),  subject 
to  mean  value  constraints  {£/,•,£,•,<?,:/  =  1,2,..., R }.  The  Lagrangian  coefficients  g,and  x,of 
ME  solution  (4.1)  are  assumed  to  be  invariant  of  the  buffer  size  and  are  thus  of  the  same  form 
as  these  of  a  stable  GER/GE/1  queue  (see  Appendix  I)  i.e.,  {  g,,x;:  i  =  1,2,..., R  }  are 

determined  by  making  asymptotic  connections  with  the  ME  solution  of  a  stable  GER  /  GE  / 1 
queue  and,  clearly,  are  given  by 


_  PiQ-Xj) 
xi(l~Pi)’ 


Li 


■Pi 


i  =  1,2,..., R, 


(4.8) 


where  is  the  MQL  of  a  stable  GER  /  GE  / 1  queue  (see  Appendix  I)  and  is  given  by 


4  =- 
2 


Pi  + 


R  \ 

Z  PjiCa]i+p2ic$ 

7=1 


]~Pi 


with  Co] ,  the  SCV  of  stream  i,  p}l  =  A  jt  /  juh  j  =  1,2,...,  R  and  p,  =  Pp  ■ 


(4.9) 
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Equating  g,  and  x,  of  the  ME  solution  of  a  stable  GER  /  GE  / 1  queue  with  those  of  a 
stable  GE/GE/1  queue  with  overall  (merged)  interarrival  parameters  A,  and  Ca~ ,  the 
following  relationship  can  be  established: 


A, Cep 


R 

=  I> 

7=1 


i  =  1,2,...,  R. 


(4.10) 


Thus,  the  ME  solution  of  a  stable  GER  /  GE  / 1  queue  can  be  considered  as  an  ME  solution 
of  a  stable  GE/GE/1  with  merged  arrival  processes  having  as  parameters 


A,  =  f>;7,  i  -  1,2, 

7=1 

and 

Caj  =  Z^Cajh  i  =  1,2,  ...,R 
7=1  ' 


(4.11) 


(4  12) 


Note  that  expressions  (4.11)  and  (4.12)  turn  out  to  be  identical  with  those  sugested  in  by 
Gelenbe  and  Pujolle  [3]  Moreover,  the  interdeparture  process  of  a  stable  GER  /  GE  / 1  queue 
has  a  SCV  given  by  (c.f.  [10,  16]) 


C.df  =  p,  (1  -  C2st  +  (1  -  Pi  )Caf ,  i  =  1,2, . . . ,  R 


(4.13) 


Let  {n i,j  =  l,..,R}be  the  CLPs  of  input  streams  (/}  at  output  ports  {/}  of  a  shared 
buffer  SRxR(GEr/GE/1)/K  queue.  These  probabilities  can  be  obtained  by  using  similar 

GE-type  arguments  as  those  applied  in  the  case  of  the  shared  buffer  SRxR(GE  /  GE  / 1)/  K 
queue  (c.f.  [14])  and  are  given  by 


n  ji  -  “(A;y(K)  +  C2(K)), 


(4.14) 


where  (if),  i,j  =  1,2,...R,  is  the  jth  flow  to  output  port  i,  and 
K-l  K-l 

:i(v)(\-aji)K-v  +(1-<V 

1>=0  v— 1 


Fj,-(  K)  =  s  a„)K-  +(l-^)fc{"(vXl  -  Cft"  . 


where 

S  „  =  - 


A,(1-CT,I)  +  CT, 


(4.15) 


with 
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Ki  = 


Cs,2  + 1 


aP  = 


Caji  + 1 


i,  j  =  1,2,  .  .  R,  K  >  2  . 


Lagrangian  coefficients  {yfi  =  l,2,...,R}of  the  SRxR(GER  /  GE/  1)/  Kean  be  determined 
by  using  the  flow  balance  conditions. 


R 


X  Aji(l  -  n  p )  =  UjMi,  i  =  1,2,  R- 
7=1 


(4.16) 


Substituting  (4.14)  into  (4.16)  the  following  system  of  R  non-linear  equations  with  R 
unknowns  {>>,./  =  1,2,..., R  },  is  obtained: 


(4.17) 


for  all  /  =  1,2, . . . ,  R  and  K>2. 

System  (4.17)  can  be  solved  by  applying  the  numerical  algorithm  of  Newton-Raphson, 
which  is  generally  expected  to  give  quadratic  convergence.  One  significant  limitation  of  this 
method  is  the  requirement  that  the  partial  derivatives  of  the  Jacobian  matrix  must  be  calculated 
at  each  iteration.  However,  this  requirement  may  be  avoided  by  applying  an  efficient  recursive 
scheme  suggested  in  [14],  Thus,  because  of  the  recursive  nature  of  the  z-transforms  which  are 
used  in  the  computational  implementation  of  the  ME  solution,  the  SRxR(GER  /  GE/l/K) 
queueing  model  can  be  used  as  an  effective  building  block  in  the  analysis  of  large  MINs. 

Note  that  in  the  special  case  of  R=2,  these  equations  (4.17)  can  be  solved  analytically 
yielding  the  following  closed-form  expressions 


Tl 


2tfi*i 


(<JA  -  B), 


(4  18) 


where 


A  = 


yK  yK 
Xl  -X2 


K- 1 


„£-l 


+icf 


(K)-C«\K)\ 


+  2|  C[2) 


(K)  +  C^(K)]- 


.K- 1 


K- 1  ! 


B  = 


yK  yK 
xl  ~x2 


X,  -X, 


xf-'-xf-1  L“Z  z  V  /Jxf-xf ' 

-x*)p(c?\K)-C$\K)){x,  -x2) 


and 


y  2  = 


S2X2^X\  X2  ) 


(4.19) 


Proofs  of  equations  (4. 1 8)  and  (4. 19)  are  given  is  Appendix  II 
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5  ME  ANALYSIS  OF  BANYAN  MINs  WITH  ARBITRARY  SWITCH  SIZES 

Consider  a  packet-switched  finite  buffered  ATM  switch  with  a  Banyan  MIN-based  architecture 
depicted  in  Figure  3.  The  ATM  switch  consists  of  L  levels  and  M  stages  and  employs  as  basic 
building  blocks  R-input  and  R-output  shared  buffer  switching  elements  (RxR  crossbar 
switches). 

Let  switch-(l,  m)  denoting  a  switching  element  located  at  the  1th  level  and  mth  stage  of  the 
MIN.  Each  output  (input)  port  is  connected  to  a  ouput  (input)  pin.  The  input  and  output  pins 
of  each  switching  element  are  labelled  (including  the  MIN’s  external  input  and  output  pins)  as 
“input W’  and  “output-^”,  A=0,1,...,  R-l  from  top  to  bottom,  respectively.  In  regular  Banyan 
MINs,  where  all  switching  elements  are  the  same  size,  M  =  logR  N,  where  R  is  the  size  of  each 
switching  element  and  N  is  the  number  of  exteral  inputs  (or  outputs).  Regular  Banyan  MINs 
form  an  array  of  switching  elements  and  in  this  case  the  number  of  switching  elements  in  a  row 
is  referred  to  as  the  level  L,  where  L=N/R. 

The  input/output  ports  of  the  MIN  form  an  array  of  ‘pins’  which  are  indexed  by  a  row  then 
column  There  are  N  pins  at  each  stage.  Each  output  pin  is  linked  to  a  single  down  stream 
input  pin  at  the  next  stage.  Connections  from  output  ports  pins  to  input  port  pins  can  be  made 
in  an  arbitrary  way.  These  connections  form  the  topology  of  the  network  and  are  represented 
in  the  forwards  (FTM)  and  backwards  (BTM)  topology  matrices.  Note  that  in  a  Banyan  MIN 
only  one  path  exists  between  an  external  input  pin  and  an  external  output  pin  The  FTM  and 
BTM  have  both  M  columns  and  N  rows  representing  the  grid  of  output  and  input  port  pins, 
respectively.  Element  (n,  m)  holds  the  number  of  the  input  {output}  port  pin  at  the  (m+1)* 
{(m-l)th}  stage  that  is  connected  to  output  (input)  port  pin  n  at  the  mth  stage,  respectively. 

The  traffic  arriving  at  the  external  input  pins  of  the  MIN  is  assumed  to  be  bursty  and  is 
represented  by  GE  interarrival  times.  The  service  (transmission)  times  at  the  output  ports  are 

also  assumed  to  be  GE  distributed  with  mean,  1  /  /uk  and  SCV,  Cs\ .  The  flow  to  external 
input  pin  k  is  parameterised  by  the  overall  mean  arrival  rate,  At  and  the  SVC  of  interarrival 

times,  Cal.  Incoming  cells  traverse  the  network  according  to  both  the  network’s  topology 
matricies  and  {rks}NxN,  the  routing  probability  matrix,  where is  the  probability  that  a  cell 
originating  at  external  input  pin  k  has  external  output  pin  s  as  its  destination.  Cells  arrive  in 
geometrically  distributed  bulks,  with  an  average  bulk  size  of  (Ca2k  + 1)  /  2  .  Cells  that  arrive  in 
the  same  bulk  will  take  the  same  route  across  the  MIN  i.e.  the  routing  decission  is  made  on  a 
per  bulk  basis.  It  is  assumed  that  stage  0  switching  elements  at  the  input  edges  of  the  MIN 
may  have  infinite  or  finite  capacity  buffers,  {Kf0:  t?=0,l,..,L-l}.  Moreover,  switching  elements 
in  the  interior  or  last  stage  of  the  MIN  each  have  a  fixed  finite  capacity  buffer, 

{K?m.  m— 1,2,..,M-1,  l— 0,1,..,L-1}.  A  cell  is  lost  if  on  arrival  at  a  stage  0  switching  element 
finds  a  full  buffer.  However,  every  cell  that  enters  the  MIN  is  guaranteed  delivery  to  its 
destination.  This  constraint  along  with  the  finite  buffers  of  internal  switches,  implies  that  the 
MIN  internally  operates  a  blocking  mechanism,  which  in  this  paper  is  based  on  RS  blocking 
(c.f.,  Introduction). 
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Figure  3  A  8x8  configuration  of  a  regular  Banyan  Network 


5.1  A  ME  Product  Form  Approximation 

Suppose  at  any  given  time,  the  joint  state  of  the  network  is  denoted  by  n=(n  u  ,...,n  M ),  where 
ny  =  (nyj,  nlj2,...  nljR )  is  the  joint  state  of  shared  buffer  queueing  model  of  the  switch-(zj)  and 
ni]k  is  the  number  of  cells  queueing  for  output  port  k,  £=1,2,...,R.  Moreover,  let  p(n)  be  at 

any  given  time  the  joint  state  probability  of  the  network.  The  form  of  a  ME  solution,  p(n),  of  a 
Banyan  MIN,  subject  to  normalisation  and  the  marginal  constraints  of  shared  buffer  queueing 
systems  used  in  Section  4,  namely  utilisation,  Uyfa  0<£/y£<l,  MQL,  Ujjk<Lyfc<Kjj  ,  and 
conditional  aggregate  full  buffer  probability  with  ttyjf>  0,  <pjjk ,  0  <  cpyk  <  1 ,  j-\,2,...,R, 
i  =  1,2, . . .,  LM ,  is  given  -  via  the  method  of  Lagrange's  undetermined  multipliers  —  as 

,  L  M  R 

p(n)  -  7nnn^ft,*t(n,f)x#k,,®fc^/st(nv) .  (5]) 

z  ,=i  j=\k=\ 

where  Z  is  the  normalising  constant  and  { gijk>xijk->yijk  }>  are  the  Lagrangian  coefficients 
corresponding  to  constraints  respectively  and  sl]k  ( n  1; )  and  f,jk(nij )  are 

appropriate  indicator  functions  such  that  )  =1,  if  n^  >  0 ,  or  0,  otherwise  and 

R 

fijk (nij)  -  1 ,  if  ^  tiijk  ^Kjj,  or  0,  otherwise,  k=l,2,...R.  The  form  of  ME  solution  (5.1) 
k= l 

clearly  suggests  a  product  form  approximation,  namely 

L  M 

Kn)=nn^). 

<=1  7=1 


(5.2) 
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where  p,-(n,y)  is  determined  by  the  ME  solution  (4.1)  of  each  shared  buffer  queueing  model. 

The  ME  solution  (5.1)  can  be  implemented  computationally  by  decomposing  the  network 
into  individual  building  blocks  of  shared  buffer  switches-(ij)  with  modified  arrival  and  service 
parameters  which  capture  the  characteristics  of  the  Banyan  MIN. 

5.2  Flow  Through  the  Switching  Elements  of  a  Banyan  MIN 

The  flow  rate  from  each  input  pin  through  to  each  output  pin  of  a  Banyan  network  is 
calculated  from  the  flow  rate  entering  each  input  pin  and  the  routing  probability  matrix 

>rks }  ;V.t,v  •  Let  Tivbe  the  effective  flow  rate  from  external  input  pin  k  to  external  output  pin  s. 
Then,  it  follows  that 

lfa=A*(l -7tk)rh  k,s  =  0,1,.., N-l,  (5.3) 

where  nk  =  nai,  i  is  the  input  port  of  a  switch  at  stage  0  that  corresponds  to  input  pin  k  and 
n ai  is  the  aggregate  CLP  of  input  port  i,  i.e.,  the  probability  that  an  arriving  cell  via  external 
input  k  will  be  turned  away  (c.f.  Section  6.1). 

In  Banyan  networks  only  one  path  exists  between  k  and  s,  so  A  ks  is  the  contribution  of  flow 
given  to  each  switch  on  the  path  from  k  to  s.  The  effective  flow  rates,  { T  „■  across  input- 
output  pin  pairs  {(pi):  i,j=l,2,..,R}  of  a  switching  element  can  be  obtained  by  appropriate 
summation  of  flows  {T^}.  For  each  input  pin  j,  it  is  necessary  to  know  the  set  of  external 
input  pins  which  connect  to  it  (generally  through  other  switches)  Likewise,  for  each  output 
pin  i,  it  is  necessary  to  know  the  set  of  external  output  pins  which  ultimately  connect  to  it.  Let 
these  sets  be  denoted  by  Inpins(j,  m)  and  Outpins(i,  m),  where  (j,  m)  and  (i,m)  represent  input 
pin  j  of  a  switching  element  and  output  pin  i  both  at  stage  m,  respectively.  Any  path  that 
originates  from  an  input  pin  in  Inpins(j,  m)  and  terminates  at  a  output  pin  in  Outpins(i,m)  must 
pass  through  input  j  to  output  i.  Thus,  the  effective  flow  rate  from  input  pin  j  to  output  pin  i, 
A  jj ,  is  given  by 

Aji=  i,j  =  1,-.R.  (5.4) 

kelnpins(j,m ) 
seOutpins(i,m ) 


The  method  of  calculating  Inpins(j,  m)  and  Outpins(i,  m)  is  given  is  Appendix  III. 

Note  that  the  shared  buffer  SRxR(GER  /GE/1)/K  queueing  model  and  product  form 
approximation  (5.1)  are  applicable  to  the  performance  analysis  of  packet  switched  finite 
buffered  MINs  with  arbitrary  configuration.  However,  in  this  more  general  case,  there  are 
more  than  one  paths  through  the  MIN,  connecting  an  external  input  pin  with  an  external  output 
pin,  and  thus,  some  form  of  routing  description  is  needed,  in  addition,  to  specify  the  flow. 
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6  ME  ANALYSIS  OF  SWITCHING  ELEMENTS  WITHING  THE  BANYAN 
NETWORK 

This  section  presents  an  approximate  ME  analysis  of  three  types  of  shared  buffer  queueing 
models  of  switching  elements  within  Banyan  network,  based  on  the  Sr.xr  (GEr  /  GE  / 1 )  /  K 
buliding  block  queue  and  GE-type  flow  formulae  Note  that  for  presentational  purposes,  only 
subscripts  for,  input/output  ports  and  related  flow  streams  are  denoted  in  this  and  subsequent 
section. 

6.1  Casel:  Switching  Elements  at  the  Input  Edges  of  the  Network 

When  a  switch  is  at  the  input  edge  of  the  Banyan  network,  the  actual  (overall)  arrival 
parameters  are  known  However,  due  to  potential  RS  blocking  from  second  stage  switching 
elements,  the  perceived  (effective)  service  time  (i.e.,  total  transmission  time  experienced  by 
each  packet )  has  to  be  calculated  The  effective  service  time  can  be  expressed  in  terms  of  the 
blocking  probabilities  A  service  completer  which  finds  its  downstream  buffer  full  repeats  its 
service  As  each  output  port  is  connected  to  only  one  input  pin  of  a  downstream  switching 
element,  it  is  appropriate  to  calculate  the  effective  service  time  in  terms  of  the  overall  blocking 
probability  that  a  service  completer  at  output  port  queue  i  experiences  at  its  downstream  queue 
switch  This  overall  blocking  probability  is  clearly  given  by 

4^’  A*  =  £A*/,  i’k  =  1,2,. -,R,  (61) 

/  =  )  l=\ 


where  k  is  the  input  pin  of  a  switching  element  at  the  next  stage  which  is  connected  with 
output  pin  i  (defined  in  FTM ),  /  is  an  output  pin  of  the  same  element  which  is  connected  with  k 
and  Awis  the  overall  arrival  rate  from  input  pin  k  to  output  pin  / ,  1  -  1,2,...,  R 

By  considering  GE-type  probabilistic  arguments,  the  effective  service  time  parameters  can 
be  expressed  by  (c  f ,  [12]) 

fii  =  fij(\  -  n  a)  i  =  1,2,  ,R ,  (62) 

and 

Cs}  =  xci~(\-xelyCg},  1  =  1,2,.  ,R  (6.3) 

The  arrival  rate  from  the  external  input  stream  j  to  output  pin  i,  Ap ,  is  obtained  by  multiplying 
input  rate.  A  .  by  the  sum  of  the  appropriate  routing  probabilities  (i  .e  adding  together  the  the 
probabilities  from  j  to  all  external  (destination)  output  pins  that  pass  through  i),  namely 

Ay,  =  Ay  Y/j*  ’  6y-E2,...,R 

teOulpitlM 1 J  j 


(6.4) 
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Moreover,  the  SCV  of  the  interarrival  process  from  input  stream  j  to  input  pin  i  is  the  same  as 
that  of  the  external  SCV  of  interarrival  time,  as  routing  occurs  on  a  per  bulk  basis  (see  Section 
5)  i.e., 

Cajj  =  Ca),  i,j  =  1,2,..., R,  (6.5) 

9 

where  Ca  j  is  the  SCV  of  the  overall  interarrival  time  at  external  input  pin  i. 

For  first  stage  switching  elements  with  infinite  capacity,  the  SCV  of  the  interdeparture 
process  is  clearly  given  by  (cf.,  (4  13)  [10,  12] 

Cdf  =p,(\-pi)  +  pjCsf +(\-pi)Caf,  i  =  1,2,...,  R,  (6.6) 


where 


9  & A  a  9 

Ccij  =  Yj-T-Cajj,  and  pi=Ai/jui,  /=  1,2,...,R. 

7=1  ‘ 

Note  that  in  this  case,  each  output  port  behaves  as  if  it  were  an  independent 
GE R  /  GE  /  1  queue  with  marginal  ME  QLD,  pr  ( nr  ),nr  =  1,2, . . . ,  K  ,  given  in  Appendix  I . 

For  first  stage  switching  elements  of  finite  capacity,  the  SCV  of  the  interdeparture  process 
is  clearly  given  by  (c.fi,  (4.13),  [10,  12]) 

Cdf  =  pi(\-p,)  +  pjCsf +(\-pi)Caf,  /  =  1,2,..., R,  (6.7) 

where 

^ ji  ~  A/iO  —  ft ji\ 

R 

7=1 

and 

Pi=Aj/jUj,  i  =  1,2,  ...,R, 

The  CLP  7t  j,  can  be  determined  from  the  ME  solution  of  the  shared  buffer 
SRxR(GEr  /  GE  / 1)/  K  queue  (c.f..  Section  4.2),  namely. 


Ca ji  7t jj  +  (1  7t jj)Ca jj,  i,  j  -  1,2, . . . , R, 

/  =  i,2,...,r, 

7=1  AJ 
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_  Fj^  +  C^K) 

nii  -  Y^\  >  (6.8) 

ZCi(v)  +  C2(K) 

v=0 

where  Fji(K)  is  given  by  equation  (4  16)  incorporating  parameters  AjhCajhfihCsf ,  as 
appropriate. 

The  aggregate  blocking  probability,  naj ,  at  input  pin  j  is  clearly  given  by 

R  A 

71  aj  =  Tj  n  ji  J  =  ^2.  >  R  (6.9) 

i=l 

Note  thatQ(v)  is  a  function  of  the  Larangian  coefficients  {  g,-,*,-:  i  =  1,2,.  ..,7? }  (which  can 
be  calculated  from  the  input  parameters),  whilst  C2(K)  is  dependent  upon  all  Lagrangian 
coefficients  {  g,-,*,, yf.i  =  1,2,...,  7?  }.  The  { g,  (coefficients  are  obtained  by  solving  the  non¬ 
linear  equations  which  are  of  the  same  form  as  the  ones  determined  by  (4. 17),  if  R>2  or  (4  18)- 
(4.19),  if  R=2.  The  solution  of  these  equations  along  with  those  of  Section  4.2  give  the  QLDs 
of  switching  elements  at  stage  0  of  the  MIN  together  with  other  performance  metrics. 

6.2  Case  2:  Switching  Elements  at  the  Interior  of  the  Network 

When  a  switching  element  is  internal  to  the  Banyan  network  at  stage  m,  m=l,2,...,M-l,  the 
throughput  (effective  arrival  rate)  can  be  determined  in  terms  of  the  effective  arrival  rates  of 
the  external  input  ports,  the  routing  probabilities  and  the  network  topology.  The  SCV  of  the 
effective  interarrival  process  is  obtained  from  the  SCV  of  the  output  process  of  the  previous 
stage.  The  values  of  the  Lagrangian  coefficients  of  the  ME  solution  p(nj,  n  e  S(N,  R) ,  can  be 
computed  in  terms  of  parameters  of  the  overall  flow  which  are  related  to  the  parameters  of  the 
effective  flows  and  the  blocking  probabilities.  These  form  a  set  of  additional  equations  to  those 
in  Section  6.1  which  (in  addition)  need  to  be  solved  to  produce  the  QLD  and  other 
performance  metrics  for  each  internal  switching  element. 

Let  the  effective  flow  rate  that  enters  an  input  pin  j  be  denoted  by  A  •,  j=l,2,...R,  with  its 

component  flow  (j,i)  going  to  output  port  i  be  denoted  by  A  j, ,  i=l,2,...,R.  Let  Ca2,, 

i,j=L2,...,R,  be  the  SCV  of  flow  (j,i)  and  n i,j=l,2,...,R  be  the  blocking  probability  that  flow 

(j,i)  will  find  a  full  buffer.  Using  these  parameters,  the  overall  flow  from  each  input  pin  to  each 
output  port  can  be  calculated,  as  follows: 

The  overall  flow  rate.  Ay, ,  is  clearly  given  by  (c.f.  [12]) 
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AJ< 


A , 


1  -  n , 


i,j  =  1,2, 


(6.10) 


and  from  the  GE-type  splitting  flow  formulae 


Ca),  = 


C  Cl  ji  7T  jj 
1  -  n 


j,i  = 


(6.11) 


where  n p  is  calculated  from  the  ME  solution  of  the  shared  buffer  SRxR(GER  /GE/1)/K 
queue  as  described  in  Section  (6.1). 

The  total  effective  arrival  rate  at  input  pin  j,  Aj  is  expressed  as 


Ay  =  ZV  j  =  1,2,...,  R , 

*= l 

whilst  the  transition  probability  of  a  job  going  from  input  pin  i  to  output  pin  j  is  clearly  given  by 

'V 

Packets  that  arriving  in  the  same  batch  follow  the  same  route  through  the  network.  This 
means  that  within  the  network,  splitting  of  departing  flows  (from  individual  servers)  may  be 
complex,  but  fall  within  two  schemes.  In  the  first  scheme  individual  packets  choose  their  own 
downstream  queue,  upon  service  completion,  according  to  a  Bernoulli  filter.  In  the  second 
scheme  the  routing  decission  is  made  on  a  per  bulk  basis  where  the  head  of  the  bulk  (i.e.  the 
first  packet  in  the  bulk)  chooses  its  downstream  queue  according  to  a  Bernoulli  filter  and 
subsequent  members  of  the  bulk  follow  in  its  path.  The  second  scheme  produces  bigger 
arriving  bulks  that  the  first  scheme.  To  this  end,  the  effective  SCV  of  the  arrival  process  is 
determined  from  the  GE-type  splitting  flow  formulae,  namely 

Ca){  =  \  +  (Cd2predU)-\)a fi ,  ij  =  1,2,...,R,  (6.12) 

where  Cd2pred^  is  the  SCV  of  the  interdeparture  process  from  the  upstream  port/switch 

connected  at  stage  m-1  to  input  pin  j  whose  location  is  given  by  vector  BTM  (i,m),  i=l,2,..  .,R, 
m=l,2,...,  M-l 

If  the  protocol  indicates  that  the  entire  departing  bulk  will  be  directed  to  the  same 
destination  input  port,  then  no  splitting  takes  place  and 


Cctjj—C  pred(j),  j  —  1,2,...,R  . 


(6.13) 
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Finally,  the  interdeparture  process  from  output  port  i  is  given  by  (4.13),  namely 
Cdf  =  Pi 0  -  Pi)  +  P-Cs +  (1  -  ~Pl)Caf  ,  i  =  1,2, R, 


(6.14) 


where 


As  only  the  effective  arrival  parameters  are  known,  the  overall  arrival  parameters  are  given 
in  terms  of  the  blocking  probabilities,  which  are  themselves  given  by  equation  (4.14).  These 
equations  together  form  RxR  non-linear  simultaneous  equations  with  RxR  unknowns  (i.e.  the 
n jj ’s).  Writing  these  equations  as  functions  of  the  jz Jt 's  gives 


Fji(K)  +  C2(K) 


(6.15) 


v=0 


Assuming  that  the  value  of  C2(K)  is  known,  the  equations  are  solved  using  Newton- 
Raphson’s  method  to  give  the  value  of  n s.  After  the  n s  are  calculated,  the  {y,} 

coefficients  are  obtained  by  solving  the  non-linear  equations  (4.17),  if  R  >  2  or  (4. 18),  if  R=2. 

In  the  case  of  R>2,  a  new  value  for  C2(K)  is  calculated.  This  process  is  repeated  until  there  is 
no  change  is  the  value  of  C2(K) .  The  solution  of  these  equations  along  with  those  of  Section 

4.2  give  the  QLDs  of  switching  elements  internal  to  the  MIN  together  with  other  performance 
metrics. 

6.3  Case  3:  Switching  at  the  Output  Edges 

When  a  switching  element  is  at  the  external  edge  of  the  MIN,  then  its  performance  analysis 
follow  from  the  ME  solution  of  the  shared  buffer  SRxR(GER  /  GE/ 1)/  K  queueing  model  of 
an  internal  switching  element,  except  that  the  mean  rate  and  SCV  of  the  service  time  of  each 
output  port  i  are,  respectively,  are  the  actual  parameters  (ju^Csf) ,  i  =  1,2, ...,  R 


7  AN  ME  APPROXIMATION  PROCEDURE  FOR  THE  PERFORMANCE 
ANALYSIS  OF  BANYAN  MINS 


In  this  section  an  approximate  procedure  for  obtaining  the  ME  QLDs  and  other  performance 
metrics  at  each  building  block  of  a  Banyan  MIN  based  shared  buffer  ATM  switch  is  described. 
The  procedure  for  infinite  and  finite  first  stage  building  blocks  differ  only  in  that  the  later 
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includes  the  calculation  of  the  first  stage  blocking  probabilities  and  the  flow  rates  through  the 
network  which  are  depended  upon  these  probabilities.  It  is  assumed  that  and  the 
interdeparture  processes  to  be  of  GE  type.  When  these  processes  split  into  a  number  of 
streams  distributed,  to  different  output  ports,  it  is  assumed  that  the  splitting  is  Bernoulli.  These 
assumptions  give  rise  to  interarrival  processes  which  are  approximately  the  superposition  of 
GE  streams.  Thus  interarrival  processes  can  be  determined  and  their  parameters  evaluated. 

7.1  An  ME  Algorithm  the  Analysis  of  Banyan  Networks 

Begin 

Step  1.  Initialise  all  cell  loss  probabilities.  Set  SCV  of  inter-arrival  times  to  1; 

Step  2.  Calculate  effective  flows  across  Banyan  MIN  and  at  each  switching  element  (c.f. 
section  5.2); 

Step  3.  At  the  first  stage  represent  each  of  its  switching  elements  as  a  shared  buffer  building 

block  queue  ,S'RxR {GE R ( A p , Ca 2Jt ) /  GE(jui,Csf )/  lj/  <x>,  /,  j  =  /, 2,...,R,  in  the  case 

of  infinite  capacity,  or  as  5'RxR  (GE  R  ( A  ]t ,  Ca2]t )  /  GE(pnCsf )  /  lj/  K,  /,  j  =  1,2, ...,R 
for  the  case  of  finite  capacity,  and  calculate  for  each  output  pin  i  the  SCV  of  the 
interdeparture  process  Cdf  ,  i=l,2,...,R,  to  be  used  in  the  next  stage  using  equations 
(6.6)  and  (6.7),  as  appropriate; 

Step  4.  From  left  to  right  do  until  last  but  one  stage: 

represent  each  stage  switching  element  as  a  shared  buffer  building  block  queue 

SRxR {gEr{2 jtiCcTji ) /  GE(juh Csf )  /  lj  /  K,  ij  =  1,2..., R,  and  calculate  for  each 

output  pin  i  the  SCV  of  the  interdeparture  process  Cd~ ,  /  =  1,2, .  R ,  for  the  next  stage, 
using  equation  (6.14) 

Step  5.  Analyse  the  performance  of  each  switching  element  by  solving  a  shared  buffer  building 
block  queue  i’RxR(^G£R(T^,Caj,)/ GE(pi,Csf)/ \^/  K,  i,j  =  1,...,R 

For  first  stage  switching  elements  with  infinite  capacity  repeat  Steps  3-5  and  for  the 
corresponding  case  of  finite  capacity  repeat  Steps  2-5  until  convergence  of  the  calculated 
values  of  the  SCV  of  the  interdeparture  times  and  the  blocking  probabilities  of  the  first  stages 
(as  appropriate).  Print  out  ME  QLDs  and  typical  performance  metrics. 

End. 

Remarks 

The  main  computation  effort  of  the  ME  algorithm  is  at  every  iteration  between  steps  3  and  5. 
The  non-linear  system  of  equations,  {yj:  i=l,2,...,R},  for  each  switching  element  can  be  written 
as  Y=F(Y),  where  Y  and  F  are  column  vectors  of  dimension  Cl,  where  is  the  cardinality  of 
the  set  {yj } .  Similarly  the  non-linear  equations  i,j=l,2,...,R}  can  be  written  as  n=G(IT), 
where  FI  and  G  are  column  vectors  of  dimension  Q\  where  Cl'  is  the  cardinality  of  the  set 
(rtij } .  It  can  be  verified  that  the  computational  cost  of  the  algorithm  is  0(ML(Q3+O'  3)),  where 
M  is  the  number  of  stages  and  L  is  the  number  of  levels  of  the  MIN,  Q3  is  the  number  of 
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manipulations  for  inverting  the  Jacobian  of  F  with  respect  to  Y  and  fi’3  is  the  number  of 
manipulations  for  inverting  the  Jacobian  of  G  with  respect  to  n. 

The  existence  and  unicity  for  the  solution  of  the  system  of  non-linear  equations  is  difficult  to 
prove  analytically  due  to  the  complexity  of  the  expressions  of  the  blocking  probabilities  {7^} 
and  the  expression  of  Lagranagian  coefficients{yj}.  Furthermore  no  strict  mathematical 
justification  can  be  given  for  the  convergence  of  {  Cd~  i  =  1,2,  ...,R  };  nevertheless,  numerical 
instabilities  or  non-convergence  have  never  been  observed  in  many  experiments  that  have  been 
carried  out  If,  however,  at  some  iteration  it  is  observed  for  at  least  one  queue  j  that 
Pj  =  (Aj  /  Pj)>  1,  then  there  exists  only  one  trivial  solution  with  =  1, i  e {1,2,...,  J?},  which 
is  outside  the  domain  at  validity  of  the  model. 

When  switching  elements  of  infinite  capacity  are  present  at  the  first  stage  0,  necessary 
conditions  for  the  stability  of  the  entire  network  are  not  obvious  due  to  the  constraining 
influence  on  a  output  port’s  service  rate  by  downstream  blocking.  In  essence,  the  stability 
condition  for  a  single  output  port  is  that  the  effective  arrival  rate  be  less  than  its  effective 
service  rate  which  can  only  be  approximately  determined.  This  subject  merits  further  research. 

In  cases  of  hot  spot  routing,  cells  are  directed  towards  one  particular  output  with  a  high 
probability.  As  this  probability  approaches  unity,  the  MIN  becomes  equivalent  to  an  arbitrary 
network  with  blocking  and  has  an  inverted  tree  configuration. 


8  NUMERICAL  RESULTS 

This  section  presents  typical  numerical  results  in  Tables  1-12  focusing  on  8x8  (c.f.,  Tables  1- 

11)  and  27x27  (c.f.,  Table  12)  Banyan  MINs  with  2x2  and  3x3  switching  elements, 
respectively.  The  aims  of  the  study  is  to  (i)  validate  the  relative  accuracy  of  the  ME 
approximation  algorithm  against  simulation  (SIM)  (c.f..  Tables  1-8)  (ii)  define  experimental 
bounds  (c.f..  Table  12)  and,  (iii)  perform  a  buffer  capacity  optimisation  across  the  entire 
Banyan  Network  (c.f..  Tables  9-11). 

In  all  experiments,  external  input  ports  of  the  Banyan  MIN  at  stage  0  receive  traffic  with 
identical  parameters.  In  total,  three  different  routing  schemes  are  adopted,  namely  uniform 
routing  (regular  traffic)  towards  the  external  output  pins  at  final  stage  2  (c.f..  Tables  1-3,  6- 

12) ,  and  moderately  or  substantially  biased  routing  towards  an  external  output  pin  referred  to 
as  a  warm  spot  (c.f..  Table  4)  or  hot  spot  (c.f.,  Table  5),  respectively.  Note  that  in  the  case  of 
uniform  routing  all  switching  elements  belong  to  a  particular  stage  will  have  the  same  output 
statistics.  However,  in  the  general  case  of  non-uniform  routing,  switching  elements  within  a 
stage  will  have  different  performance  metrics.  For  each  input  port  at  stage  0,  and  without  loss 
of  generality,  identical  routing  probabilities  biased  towards  the  warm  or  hot  spot  are  used  in 
Tables  4  and  5,  respectively.  As  a  consequence,  switching  elements  at  each  stage  of  the 
decode  tree  (i.e.,  the  tree  composed  from  the  routes  connecting  external  input  pins  with  the 
warm  or  hot  spot  external  output  port)  will  have  identical  performance  metrics.  Thus,  in  both 
cases  of  uniform  and  non-uniform  routing,  performance  metrics  are  only  reported  once  in 
Tables  1-12  respectively. 
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Tables  1  -8  present  a  validation  study  of  the  ME  algorithm  against  simulation  which  includes 
aggregate  MQLs  {£,:/  =  0,1,2}  at  stages  0,  1  and  2,  throughputs  {/t2}  of  either  a  typical 

external  output  port  under  uniform  routing  (c.f.,  Tables  1-3,  6-8)  and  warm/hot  spot  external 
output  port  (c.f.,  Tables  4  and  5),  and  also  the  aggregate  CLP  of  a  typical  switching  element  at 
stage  0.  Moreover,  Tables  6-8  display  aggregate  and  marginal  state  probabilities  for  a  typical 
8x8  Banyan  network  under  uniform  quoting.  Note  that  the  simulation  results  in  Tables  1-8 
were  produced  at  95%  confidence  intervals  by  using  the  Queueing  Network  Analysis  Package 
(QNAP-2)  It  can  be  observed  that  the  ME  solutions  are  consistently  comparable  with  those 
of  simulation  (SIM)  for  a  wide  range  of  parameterisation,  including  deterministic  transmission 
times  applicable  to  ATM  switching  elements.  Note  that  confidence  intervals  are  of  small 
magnitude  e  g.,  typically  ±  0.01  for  MQLs.  Moreover,  percentage  differences  for  MQLs  are 
generally  less  than  10%  and  error  tolerances  for  state  and  blocking  probabilities,  (i.e.,  absolute 
differences  between  ME  and  SIM  results)  are  less  than  0.05.  The  accuracy  of  ME 
approximations  begin  to  deteriorate  as  the  value  of  SCVs  increases.  This  can  be  attributed  to 
further  violation  of  renewality  assumptions  of  the  various  flow  in  the  network. 

The  ME  algorithm  is  utilised  in  performing  a  buffer  capacity  assignment  optimisation  across 
the  Banyan  MIN  (c.f  Tables  9-1 1).  Given  an  overall  buffer  allocation  for  the  entire  network,  it 
is  possible  to  carry  out  buffer  assignments  to  individual  switching  elements  in  order  to  optimize 
the  throughput  or  the  end-to-end  delay.  Three  different  buffer  allocation  policies  are 
considered  by  assigning  more  of  the  allocated  buffer  capacity  to  the  first,  second  and  third 
stages,  respectively.  From  Tables  9-11,  it  can  be  observed  that  by  placing  more  of  the  buffer 
allocation  at  the  first  stage  of  the  network,  the  throughput  can  be  increased  whilst  the  end-to- 
end  delay  is  not  adversely  affected.  This  behavoir  is  intuitively  correct  since  the  CLP  is  smaller 
than  in  the  other  two  cases,  thus  allowing  more  packets  into  the  network. 

Finally,  Table  12  focus  on  27x27  Banyan  networks  with  3x3  switching  elements  under 
regular  traffic.  Relative  performance  comparisons  are  carried  out  between  the  ME  solutions 
produced  incorporating  the  routing  of  entire  bulks  within  the  network  (c.f.,  (6.14))  and  SIM 
results  produced  using  specially  designed  programs  written  in  C.  It  can  be  seen  that  the 
analytic  solutions  for  first  stage  MQL,  {/,0 }  and  aggregate  CLPs,  {tta},  are  comparable  in 

accuracy  to  those  of  simulation,  as  in  the  examples  of  Tables  1-8  (n.b.,  both  ME  algorithm  and 
simulations  use  identical  external  inputs  at  stage  0).  However,  the  ME  solutions  define 
(experimentally)  pessimistic  bounds  over  the  corresponding  SIM  results  produced  concerning 
the  MQLs  of  output  ports  at  stages  1  and  2.  This  behaviour  is  due  to  the  fact  that  the  ME 
approximation  overestimates  the  size  of  the  bulk  transitions  in  the  interior  of  the  network,  and, 
subsequently,  the  SCV  of  the  interarrival  time  of  each  internal  and  last  stage  output  port.  The 
study  of  analytic  performance  bounds  merits  further  research. 

9  CONCLUSIONS 

A  cost-effective  approximate  algorithm,  based  on  the  principle  of  ME  and  the  notion  of  system 
decomposition,  is  proposed  for  the  performance  analysis  and  prediction  of  packet-switched 
buffered  Banyan  MIN-based  ATM  switch  architectures  with  arbitrary  buffer  and  building  block 
sizes,  GE-type  interarrival  and  service  times  and  RS  internal  blocking.  Analytic  ME  solutions 
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Table  1  Uniform  Routing 


Banyan 

MIN 

No. 

Input  Data 
{r,j=0.125,i,j=0,l,... 
N=8;p=l;K=9 

>7}; 

Output  Statistics 

A  Ca2 

Cs2 

Lo 

L, 

u 

^ a 

Method 

1 

0.5  3 

3 

3.1760 

2.6261 

2.3536 

0.4387 

0.1226 

ME 

3.1100 

2.7210 

2.3530 

0.4440 

0.1211 

SIM 

2 

0.5  5 

5 

3.4429 

2.7746 

2.3691 

0.3869 

0.2263 

ME 

3.4620 

2.9050 

2.3300 

0.3877 

0.2229 

SIM 

3 

0.5  7 

7 

3.5240 

2.8164 

2.3376 

0.3502 

0.2995 

ME 

3.6990 

2.9180 

2.3020 

0.3466 

0.3020 

SIM 

4 

0.5  11 

11 

3.5525 

2.8112 

2.2586 

0.3020 

0.3959 

ME 

4.0030 

2.8840 

2.0940 

0.2912 

0.4175 

SIM 

5 

0.5  15 

15 

3.5372 

2.7735 

2.1887 

0.2711 

0.4578 

ME 

4.1590 

2.8510 

1.9650 

0.2559 

0.4862 

SIM 

for  the  QLD  of  a  shared  buffer  Srxr  (GEr/GE/1)/K  queue  in  conjunction  with  GE-type 
formulae  for  the  first  two  moments  of  the  effective  service  times  and  traffic  flows  in  the 
network,  play  the  role  of  effective  building  blocks  in  the  decomposition  process  of  the  entire 
network.  Numerical  results  are  included  to  illustrate  the  relative  accuracy  of  ME 
approximations  against  simulation,  define  experimental  MQL  bounds  in  the  interior  and  last 
stage  of  the  network  and  to  investigate  the  buffer  capacity  optimisation  across  the  entire  MIN. 
This  study  has  shown  that  the  ME  approximation  algorithm  is  a  credible  analytic  tool  for  the 
cost-  effective  performance  modelling  and  optimisation  of  complex  MINs  represented  by 
Banyan  networks.  The  ME  algorithm  can  be  extended  towards  the  approximate  analysis  of 
ATM  switch  architectures  with  space  and  service  priorities.  Moreover,  closed  form 
expressions  for  queueing  models  of  ATM  networks  with  both  bursty  and  correlated  traffic  can 
be  derived  based  on  the  stochastic  analysis  of  single  finite  queues  with  batch  renewal  arrival 
processes  (c.f.,[l  8]).  Extensions  of  this  kind  are  the  subject  of  current  study. 
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Table  2  Uniform  Routing 


Banyan 

No 

Input  Data 

{r „=0. 125.i  J=0,l,..,7}; 
N=8;  u=l  ;K=9 

Output  Statistics 

A 

Ca2 

Cs2 

Lo 

Li 

^2 

l2 

A- a 

Method 

10 

0.3 

7 

7 

2.1278 

1.6379 

1.3560 

0.2451 

0.1829 

ME 

1.9580 

1.5800 

1.3510 

0.2537 

0.1497 

SIM 

11 

0.5 

7 

7 

3.5240 

2.8164 

2.3376 

0.3502 

0.2995 

ME 

3.6990 

2.9400 

2.2480 

0.2467 

0.3020 

SIM 

12 

0.7 

7 

7 

4.7570 

3.7642 

2.9907 

0  4081 

0.4169 

ME 

5.2130 

3.7750 

2.6850 

0.3847 

0.4502 

SIM 

Table  3  Uniform  Routing 

Banyan 

No. 

Input  Data 

{rir0.125,i,j=0,l,..,7}; 

N=8;u=l;K=5 

Output  Statistics 

A 

Ca2 

Cs2 

Lo 

Li 

l2 

TCa 

Method 

13 

0.5 

5 

0 

1.600 

1.111 

0.8952 

0.3253 

0.3493 

ME 

1.269 

0.8553 

0.8404 

0.3000 

0.2899 

SIM 

14 

0.5 

5 

1 

1.7507 

1.2776 

1.0493 

0.3237 

0.3526 

ME 

1.4280 

1.1590 

1.0640 

0.3463 

0.3026 

SIM 

15 

0.5 

5 

3 

1.9570 

1.5271 

1.2600 

0.3164 

0.3671 

ME 

1.9450 

1.5920 

1.2760 

0.3254 

0.3487 

SIM 

16 

0.5 

5 

5 

2.1163 

1.7074 

1.3837 

0.3071 

0.3859 

ME 

2.3160 

1.7810 

1.2890 

0.2965 

0.4109 

SIM 
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Table  4  Hot  Spot  Routing 


Banyan 

No. 

Input  Data 

{r„=0.02,i=0,l . 7J=J,2,...  7}; 

{r„=0.86,  i=0,l . 7 ,j=0}; 

N=8;u=l;K=9 

Output  Statistics 

A  Ca2  Cs2 

Lo  Li  L2  X2  na  Method 

16 

0.1  3  1 

0.4705  0.9920  2.5333  0.6926  0.0078  ME 

0.4763  1.0240  2.6470  0.7062  0.0062  SIM 

17 

0.1  7  3 

0.7625  1.6055  2.9622  0.6384  0.1076  ME 

0.7720  17230  3.0470  0.6431  0.1034  SIM 

Table  5  Warm  Spot  Routing 


Banyan 

No. 

Input  Data 

{rif=0.1I,  i=0,l,...,7,j=l,2,...,7}; 
{r,y=0.23,  i=0,J...,7,j=0}; 
N=8;p=I;  K=9 

Output  Statistics 

A 

Ca2  Cs2 

Lo 

Ll 

l2 

/l2 

K  a 

Method 

18 

0.3 

7  3 

1.9505 

1.6403 

1.8753 

0.4523 

0.1805 

ME 

1.5670 

1.4740 

1.8440 

0.7180 

0.1432 

SIM 

19 

0.5 

5  5 

3.6797 

4.2865 

4.5339 

0.6926 

0.2472 

ME 

4.0050 

4.4510 

4.0750 

0.6663 

0.2733 

SIM 

Table  6  First  Stage  QLDs _ _ 

Input  Data:  {Ca 2  =5,  Cs 2  =5,  A=0.5,p=l,K=5,N=8}; 
 {nrO.125,  i,j=01,.„7} 


Aggregate  QLD 

Marginal  OLD 

n 

ME 

SIM 

ME 

SIM 

0 

0.3770 

0.3565 

0.6351 

0.5864 

1 

0.1230 

0.1412 

0.1025 

0.1417 

2 

0.1190 

0.1300 

0.0849 

0.1111 

3 

0.1130 

0.1189 

0.0694 

0.0793 

4 

0.1070 

0.1064 

0.0558 

0.0501 

5 

0.1610 

0.1470 

0.0525 

0.0313 
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Table  7  Second  Stage  QLDs _ 

Input  Data.  { Ca 2  =5,  Cs 2  =3,  A=0.5,  p= I ,K=5,N=8j; 


{r ,t=0.125,  i,j=0,l,..,7} 


_ O'l  -’J 

Aggregate  OLD 

Marginal  OLD 

n 

ME 

SIM 

ME 

SIM 

0 

0  4220 

0.3904 

0.6589 

0.6211 

1 

0.1740 

0.1859 

0.1346 

0.1605 

2 

0.1360 

0.1420 

0.0887 

0.1006 

3 

0.1040 

0.1082 

0.0576 

0.0609 

4 

0.0790 

0.0804 

0.0364 

0.0348 

5 

0.0840 

0.0931 

0.0237 

0.0222 

Table  8  Third  Stage  QLDs  _ 

Input  Data:  {Ca~  =5,  Cs 2  =3,  A=0.5,  p=J,K=5,N=8}; 


{r "=0.125,  i,i=0,l,..,7} 


r  n  ” •  •*  - ~ 

Aggregate  OLD 

Marginal  OLD 

n 

ME 

SIM 

ME 

SIM 

0 

0.4570 

0.4572 

0.6814 

0.6734 

1 

0.2000 

0.2019 

0.1479 

0.1575 

2 

0.1370 

0.1368 

0.0841 

0.0858 

J 

0.0920 

0.0894 

0.0471 

0.0459 

4 

0.0600 

0.0583 

0.0257 

0.0236 

5 

0.0530 

0.0564 

0.0138 

0.0137 

Table  9  Buffer  Assignment  Biased  for  Stage  0 

Input  Data:  { Ca "  =Cs2  =5,  A=0.1,  p=l,  N=8}; 
{ru=0.125,  if =0,1,..  ,7}  


(UL- 

k2 

End-to-End 

Delay 

CLP 

9 

9 

9 

9.0565 

0.1170 

11 

8 

8 

9.5461 

0.0838 

13 

7 

7 

9.9613 

0.0616 

15 

6 

6 

10.3868 

0.0472 

17 

5 

5 

10.9864 

0.0392 

19 

4 

4 

12.1656 

0.0384 

21 

3 

3 

15.3603 

0.0541 
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Table  10  Buffer  Assignment  Biased  for  Stage  1 


Input  Data:  f Ca 2  =Cs2  =5,  A=0.1,  p=l,  K=5,  N=8}; 


Table  11  Buffer  Assignment  Biased  for  Stage  2 _ 

Input  Data:  {Ca2  =Cs 2  =5,  A=0.1 ,  p=l,  K=5,  N=8}; 


APPENDIX  I  DERIVATION  OF  AN  ME  QLD  FOR  A  STABLE  GER/GE/1 
QUEUE 

D 

Consider  a  stable  FCFS  GE  /  GE/ 1  single  server  queue  i  depicted  in  Figure  4.  The  queue 
receives  a  multiple  input  of  R  streams  with  GE-type  interarrival  parameters. 

2 

(A pCajj ),  j  =  1,2, .  . ,  R  .  Moreover,  the  server  provides  GE-type  service  time  with 
parameters  {phCs~). 
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Table  12  Analytical  Bounds  over  Simulation  for  27x27  MINs 


Banyan 

No. 

Input  Data 

{rif= 0.037,  ij-0,l,...,26}; 
N=27;p=l;  K=9 

Output  Statistics 

A  Ca2  Cs2 

Lo 

L, 

l2 

ft  a 

Method 

18 

0.10  1  1 

0.3330 

0.3330 

0.3330 

0.0000 

ME 

0.3339 

0.3331 

0.3333 

0 

SIM 

19 

0.25  1  1 

0.9999 

0.9999 

0.9999 

0.0001 

ME 

0.9969 

1.0330 

0.9997 

0.0001 

SIM 

20 

0.10  7  1 

0.8925 

0.9937 

0.8475 

0.1241 

ME 

0.9000 

0.5600 

0.4800 

0.1234 

SIM 

21 

0.25  7  1 

1.9750 

2.2800 

1.8069 

0.1973 

ME 

2.2020 

1.3600 

1.2203 

0.2022 

SIM 

22 

0.10  3  7 

0.7200 

0.7021 

0.6927 

0.0112 

ME 

0.7541 

0.6918 

0.6378 

0.0126 

SIM 

23 

0.25  7  3 

2.1780 

2.4270 

1.9477 

0.1921 

ME 

2.3981 

1.8912 

1.5398 
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Figure  4.  A  stable  (GER  /  GE  / 1)  Queue  i 
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Suppose  all  that  is  known  about  the  GEK  /  GE/1  queue  is  that  the  server  utilisation,  pit 
and  MQL,  L,-.  Entropy  maximisation,  subject  to  normalisation,  utilisation  and  MQL 
constraints,  implies  that  the  QLD  of  the  GER  /GE/1  queue  is  given  by 


(Al) 


where  Z,  =  1  / /?,(0)is  the  normalising  constant,  h(n,)is  an  auxiliary  function  defined  by 
/?(/?,)  =  1,  if  n ,  >0  or,  0,  otherwise,  and  {g,,  x, } are  the  Langriangian  coefficients 
corresponding  to  utilisation  and  MQL  constraints,  respectively. 

The  server  utilisation,  pt  is  clearly  expressed  by 

R 

Pi=HPji >  Pji  =  Aji 1  Pi’  J  ~  k2,...,  R  (A2) 

i=i 


Moreover,  an  expression  for  the  MQL,  Lh  can  be  obtained  from  the  generalised  P-K 
expression  for  a  stable  M®/G/l  queue  [9],  namely 

Li  =  ~ +  1  ,  (p^Csj  +  pMCbf  + 1)),  i  =  1,2,  ,  R,  (A3) 

2  2(1  -  Pi ) '  ’ 

where  b  is  the  mean  and  Cb2  is  the  SCV  of  the  bulk  size  distribution. 


In  the  case  of  a  number  of  arriving  bulk  Poisson  streams  with  parameters  />;i  and 

j 

Cbjh  j  =  1,2,...,  R,  respectively,  the  overall  arrival  stream  is  another  bulk  Poisson  stream  with 

mean,  bt ,  and  SCV,  Cb~  .  The  later  parameters  can  be  determined  via  the  law  of  total 
moments,  namely, 

R 

bi  =  ZbjiPji >  z  =  k2, ...,R,  (A4) 

7=1 

where 


A  ••  R 

Pji=-r->  A,-ZA7y>  /  =  1,2, ...,R, 

A'  7=1 


Cbf  = 


bV-b2 


i  =  1,2,..., R, 


(A5) 
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where 


is  the  second  moment  of  the  bulk  size  of  the  overall  stream,  namely 


Z b$Pji  ’  = 
j=‘ 


(A6) 


where  bff  is  the  second  moment  of  the  bulk  size  for  steam  j. 

Manipulations  lead  to  relations 

R 

(Cbf  +  1)A A  =  Z (Cbi  + 1) A jibji ,  i  =  1,2,  (A7) 

7=1 


Substituting  into  the  generalised  P-K  expression  (A3)  the  following  formula  for  the  MQL  of  a 
stable  MB  /  G  / 1  queue  with  an  aggregate  of  R  multiple  input  streams  is  obtained: 


1 

2(1  -Pi) 


R 

PiCl  +  Y.PjMCb^+i) 

V  7=1  ) 


i  =  1,2, ...,R. 


(A8) 


Note  that  the  superposition  of  R  GE-type  streams  results  into  an  overall  bulk  Poisson 
process,  but  the  bulk  size  distribution  is  determined  by  a  sum  of  geometries.  Moreover,  the 

2 

individual  parameters  of  each  bulk  size  distribution  and  CbJt  can  be  expressed  by 


bJi  - 


C  Cl  y  + 1 

- ,  i,j  =  1,2,..., R, 


and- 


Cbi  = 


Cajj-\ 
Caj,  + 1 


,  i,  j  =  1,2, . . . ,  R  . 


(A9) 


(A10) 


Using  expressions  (A9)  and  (A10),  formula  (A8)  becomes  identical  for  MQL  expression 
(4.9).  Moreover,  subtituting  (Al)  into  the  constriants  of  utilisation,  p,and  MQL,  Lh  and 
carrying  out  some  manipilations,  Lagrangian  coefficients  g,  and  x,  are  determined  via 
expressions  (4. 14). 
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APPENDIX  II  DERIVATION  OF  THE  LARGRANGIAN  COEFFICIENTS 
{yff  =  1,2,  .., R}  FOR  THE  CASE  OF  R=2 


The  Largrangian  coefficients  {yyi  =  1,2,...,  R}  of  the  SRxR(GER  /  GE  / 1)/ K  are  determined 
by  solving  the  set  of  nonlinear  simutaneous  equation  (4. 17).  These  can  be  written  as 

C(')(K)[Y]  =  C«(K),  i  =  1,2,..., R  ,  (A11) 

where  [Y]  denotes  the  vector  of  yt's .  In  the  case  of  R  =  2  ,  equation  (A1 1)  can  be  solved 
analytically  as  follows: 


<f(K)[Y]  =  £ixiTi(xiA  t xf  2x2+...+xf 
+g2x2y2g\x\yi  (xf~2  +  xf~\2 +. .  .+xf~2), 

and 

C22)(K)[Y]  =  g2x2y2(xf~]  +  xf^2x2+...+xf~l) 
+S2x2y2g\x\yi(x\K~2+xf~3x2+-+x2~2), 


(A12) 


(A13) 


which  leads  to 

d%)[Y]-cjV)[Y]  =  g2x2y2(xfC  1  +  xf  2X2+...+X2  !) 
-g] xlTl  (xf  _1  +  x1A_2x2  +■  •  -+x2  ) 


Equation  (A.  14)  can  be  simplified  by  using  the  identitiy 
(Xf'  +  xf ~~x2  +•  •  -+xf~l )(*!  ~X2)  =  (xf  -xf). 

To  this  end,  solving  (A.  14)  with  respect  to  the  Largrangian  coefficient  y2 ,  it  follows  that  (4.19) 
holds. 


Substituting  y2  into  (A.  14),  the  following  equation  is  obtained: 

0  =  yhM(xr-4-1) 


r 

g\xi 

xf  -xf 

+  g\xi 

(xf-'-xt'^cgXKy^Kjf 

V 

y 

-c^(k) 


(A15) 
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Solving  equation  (A  1 6)  for  y ,  and  taking  the  positive  root  yields  expression  (4.18). 

APPENDIX  III  METHOD  OF  CALCULATING  THE  INPINS  AND  OUTPINS 
SETS 

Let  Inpins[i,m]  be  the  set  of  external  input  pins  of  the  Banyan  Network  which  are  connected  to 
an  interior  input-port  pin  at  position  (i,m)  of  the  array  of  input  pins.  Likewise  let  Inpins'  [i,m] 
be  the  set  of  external  input  pins  that  are  linked  to  an  interior  output  pin  at  position  (i,m)  of  the 
array  of  output  pins.  Let  S  be  the  set  of  input  pins  that  constitute  the  inputs  of  a  particular 
switch.  Each  switch  fully  connects  all  its  input-pins  to  its  output  pins,  therefore  the  sets 
Inpins[i,m]  and  Inpins'  [i,m]  are  related  as  follows: 

Inpins'  [/,  m]  =  [J  Inpins[k,  m] . 

AeS 


Each  interior  input-port  pin  is  connected  to  one  output  pin  from  the  previous  stage,  which  can 
be  determined  from  the  backwards  topology  matrix  (BTM),  i.e., 

input-pin  at  position  (i,m)  connects  with  output  pin  located  at  BTM  [i,m],  which  is  of  course  at 
stage  (m-1). 

Thus,  each  input  pin  ‘inherits’  its  Inpin  set  from  the  output  pin  that  it  is  connected  to,  i.e., 
Inpins[i,m]  =  Inpins' [BTM  [i,m],m-l]. 

The  input  pins  at  the  input  edge  of  the  network  have  only  one  element  in  their  Inpins  set,  as 
they  correspond  to  a  particular  external  input  pin,  i,  i  =  0,1,...,N-1  i.e., 

Inpins[i,0]  =  {i}. 

Thus  Inpins  sets  at  each  level  are  obtained  via  the  following  procedure: 

for  i  =  0  to  N-l 
lnpins[0,i]  =  {i}. 
for  M  =  1  to  M-1 
for  i  =  0  to  N-l 

Inpins'  [i,m-l]  = 

AeS 

end  i 

for  i  =  0  to  N-l 

Inpins[i,m]  =  Inpins 

end  i 
end  j 

The  method  of  calculating  the  Outpins  sets  is  similar  to  the  one  for  determining  the  Inpins  sets 
but  is  applied  in  reverse  order. 


Inpins[k,m-1] 


'  (BTM  [i,m],  m-1]) 
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Abstract 

The  derivation  of  queue  lengths  and  buffer  fills  in  ATM  networks  with  particular  types 
of  cell  arrival  is  known  explicitly  for  some  specific  cases,  but  often  simulation  must  be 
resorted  to,  which  is  time  consuming  and  the  accuracy  is  difficult  to  determine, 
particularly  when  one  is  considering  cell  losses  or  CDV  around  the  10'8  level.  We 
describe  an  alternative  technique  which  involves  iteration  of  the  Markov  transition  and 
state  probabilities  for  a  queue;  we  have  called  this  Animation.  Its  principle  is  simply 
that  of  repetitively  applying  the  appropriate  Markov  chain  transition  matrix  to  the 
current  queue  length  probabilities  to  produce  a  new  set  of  probabilities.  In  the  case  of 
a  known  cell  sequence  arrival  the  transition  matrix  changes  for  each  iteration  to  reflect 
the  arrival  probability.  Animation  presents  some  numerical  difficulties  and  techniques 
are  described  to  overcome  these.  The  specific  application  described  is  the  derivation 
of  equivalent  bandwidth  of  cell  patterns  resulting  from  ATM  policing  or  shaping,  and 
the  transit  of  these  cells  through  networks  containing  multiple  switching  stages. 

Keywords 

ATM,  bandwidth,  policing,  queuing,  switching 


1.  INTRODUCTION 

In  connection  with  studies  in  policing  functions  for  ATM  [1]  we  had  to  consider  their 
limitations  and,  in  particular,  the  most  adverse  patterns  (MAP)  of  full  cells  which 
would  meet  the  policing  criteria  and  hence  could  be  applied  to  the  network  by  users. 
To  give  an  idea  of  the  problem  a  typical  MAP  would  be  10  full,  25  empty,  1 1  full,  26 
empty,  1 1  full,  26  empty,  5  full  and  133  empty  cells.  After  this  the  sender  could  repeat 
the  pattern.  For  different  system  parameters  the  number  of  full  cells  in  each  burst 
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could  be  much  reduced  and  the  number  of  bursts  could  be  many  more;  the  number  of 
final  empty  cells  may  rise  to  several  thousand.  In  order  to  assess  the  impact  of  a  stream 
with  such  a  MAP  we  needed  to  estimate  their  equivalent  bandwidth  and  the  method 
chosen  was  to  add  the  stream  to  a  infinite  number  of  random  sources  with  a  Poisson 
load  of  0.5.  The  resulting  buffer  fill  probability  would  then  be  compared  with  a  pure 
Poisson  load  adjusted  to  give  a  similar  distribution  in  the  area  of  probabilities  of  10  . 
The  excess  of  this  second  Poisson  load  over  the  background  Poisson  load  of  0.5  would 
be  regarded  as  the  equivalent  bandwidth  of  the  MAP.  To  avoid  the  introduction  of  an 
arbitrary  parameter  the  buffer  was  assumed  to  be  infinite.  A  Poisson  background  load 
seemed  appropriate  in  the  absence  of  any  clear  understanding  of  the  actual  statistics,  in 
the  circumstances  where  there  is  traffic  from  a  large  number  of  sources  ranging  from 
CBR  to  bursty,  central  limit  considerations  suggested  this  choice. 

Although  the  concept  appeared  reasonable  its  evaluation  was  not  obvious.  Analytic 
solutions  to  the  general  Poisson  +  pattern  queuing  problem  are  not  known,  the  other 
alternative  was  simulation,  but  to  get  comparisons  at  the  10'8  level  an  impractical 
number  of  cells  would  be  needed.  From  this  need  the  technique  of  Markov  Chain 
“animation”  developed. 


2.  BACKGROUND 


The  Markov  chain  is  well  established  as  a  conceptual  device  for  describing  the  manner 
in  which  state  probabilities  are  modified  by  events  and  their  probabilities.  In  ATM 

terms  the  classic  such  chain  is  the  buffer  fed  by  a 
multiplicity  of  sources. 

In  figure  1  the  circles  represent  the  states  -  in 
the  case  of  the  buffer  they  are  the  fills  of  the 
buffer,  each  with  its  associated  probability  So,  Si, 
s2,  etc.  The  circular  arrows  represent  the 
probability  that  after  an  event  the  machine 
remains  in  that  same  state  Soo,  Sn,  s22,  etc.  The 
curved  arrows  represent  the  probabilities  of 
Figure  1  Markov  Chain  transition  from  one  state  to  another.  In  the  case 

of  a  buffer  with  a  single  server  Sn,  s22,  etc.  will 
equal  the  probability,  pi,  that  one  cell  arrives. 
In  the  case  of  Soo  this  equals  the  probability  that  one  or  zero  cells  arrive.  Similarly  the 
probability  that  the  machine  moves  from  state  n  (n  cells  in  buffer)  to  state  n  +  1  (n  +  1 
cells  in  buffer),  i.e.  soi,  Si2,  s23,  etc.  is  the  probability,  p2,  that  2  cells  arrive;  the 
probability  that  the  machine  moves  from  state  n  to  state  n  -  1,  i.e.  sio,  s2i,  s32,  etc.  is 
the  probability,  p0,  that  no  cells  arrive. 

The  customary  use  for  this  information  is  to  set  up  a  set  of  equations  to  get  an  explicit 
solution  to  the  steady  state  values  of  the  state  probabilities.  This  paper  explores  the 
animation  method  in  which  the  Markov  chain  is  iteratively  applied  to  a  set  of  data  to 
derive  solutions  to  queuing  problems. 
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3.  THE  ANIMATION  TECHNIQUE 


In  a  machine  with  a  finite  number 
of  states  the  initial  state  may  be 
represented  by  a  vector  S  of  the 

form  [so,  si,  s2)  ]. 

Multiplication  by  the  transition 
matrix  will  produce  the  new  vector 
S'  representing  the  state 
probabilities  after  the  first  event 
(e  g.  a  batch  of  cells  arriving  and 
one  being  served).  Thus  one  could 
start  with  an  initial  condition 
where  no  cells  have  arrived  for  a 


very  long  time,  s<>  -  i  and  sb  s2,  Figure  2  Simple  Animation 

-  0.  After  the  first  event  So  =  Soo, 

Si  =  Soi,  s2  =  So2,  etc.  The  next  multiplication  will  establish  the  state  after  2  events  and 
so  on.  Continuing  multiplication  will  lead  S  to  tend  to  the  steady  state  condition  as 
calculated  in  the  usual  explicit  way. 

As  an  example  of  the  process  in  action  let  us  consider  a  queue  of  maximum  length  6, 
being  fed  by  two  sources  each  with  probability  of  full  cell  =  0.25,  giving  a  total  load  of 
0.5  ..  This  simplest  of  Binomial  distributions  gives  the  probability  of  an  event  with  no 
foil  cells  arriving,  p0,  =  0.5625;  with  1  cell,  Pl,  =  0.375;  with  2  cells,  p2,  =  0.0625. 
With  a  single  cell  served  per  event  then  Sn  =  s22  =  s33  =  s44  =  s55  =  0.3  7  5.  soi  =  s,2  =  s23 
=  s34  =  s45  =  s56  =  0.0625.  s65  =  s54  =  s43  =  s32  =  s2]  =  sio  =  0.5625.  Soo  =  0.9375  (the 
probability  of  no  cells  plus  the  probability  of  one  cell).  s66  =  0.4375  (the  probability  of 
one  cell  plus  the  probability  of  two  cells).  All  other  values  in  the  transition  matrix  are 
zero  as  the  corresponding  transition  is  not  possible.  These  probabilities  can  be 
repetitively  applied  to  a  starting  state  as  shown  in  Figure  2.  After  30  iterations  stability 
is  achieved  to  6  decimal  places  accuracy.  Effectively  this  process  gives  the  transient 
state  probabilities  which  are  only  rarely  of  interest.  However  it  may  be  used  as  a 
means  of  finding  the  steady  state  probabilities  in  cases  where  an  explicit  solution  is 
obscure.  It  has  further  use  where  the  load  (and  hence  the  transition  matrix)  is 
fluctuating  in  a  specific  way,  and  this  will  be  the  subject  of  the  core  of  this  paper. 


4.  THE  INFINITE  QUEUE 

In  the  introduction  it  was  mentioned  that  derivation  of  the  equivalent  bandwidth  was  to 
be  done  by  considering  the  probability  distributions  of  an  infinite  queue  loaded  with: 

a)  a  background  Poisson  source  +  pattern 

b)  a  second  purely  Poisson  source 

An  infinite  queue  was  chosen  to  avoid  introducing  another  arbitrary  parameter  but 
presents  obvious  computational  difficulties  as  only  finite  queues  can  be  handled 
numerically. 
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Extending  figure  2  to  the  infinite  case  (figure  3)  shows  the  difficulty;  however  large 
the  actual  queue  length  chosen,  Z,  with  only  the  background  source  present  one  always 


needs  one  further  element,  Z  ,  to 
compute  the  next  value  of  Z. 
Assuming  that  Z+  is  zero  produces 
considerable  distortion  of  the  values 
in  the  upper  range  of  the  queue 
resulting  in  the  need  to  compute  a 
much  extended  queue  length,  with 
run  time  penalties.  However  a 
simple  alternative  results  from  the 
well  known  observation  that  the 
state  probabilities  closely 
approximate  to  a  geometric  series 
for  low  probabilities.  Setting 
Z+  =  Z2/Y  means  that  computation 
becomes  apparently  error  free  for 
practical  purposes.  This  is  indicated 
in  figure  3  by  the  curved  arrows. 


S  S' 


Figure  3  Infinite  Queue 


5.  ROUNDING  ERRORS 

For  a  Poisson  source  of  rate  0.5,  calculation  of  the  buffer  fill  probabilities  by  the 
explicit  method  or  by  continuous  iteration  results  in  obvious  errors  at  low  probabilities 
owing  to  rounding  errors,  even  using  double  length  arithmetic  as  shown  in  Table  1.  It 
is  clear  that  the  figures  are  tending  to  a  small  (in  this  case  negative)  residual  constant 
value  rather  than  the  expected  value  of  zero.  For  the  work  to  be  described  later  it  is 
necessary  to  continue  the  iterations  considerably  more  than  this  initial  simple  example 
required.  This  leads  to  a  residual  constant  of  ever  increasing  magnitude  which,  if  left 
uncorrected,  would  limit  the  utility  of  the  technique.  A  solution  to  this  problem  also 
exploits  the  fact  that  the  values  should  approximate  a  geometric  series.  Simple  algebra 
can  determine  the  value  of  the  residual  constant. 

If  sx,  sY,  sz  are  the  values  of  the  last  three  probabilities  calculated  and  they  are  in  a 
geometric  progression  with  a  constant  error,  s,  then 

Sz  =  s  +  £,  Sy  =  KS  +  £,  SX  =  K2S  +  E 
Eliminating  s  and  k,  the  constant  error  is  given  by: 


£  -  (Sx-Sz  -  Sy2)/(sz  -  2.Sy  +  Sx) 
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Queue 

Position 

Uncorrected 

Probability 

Corrected 

Probability 

20 

2.03E-1 1 

21 

5.77E-12 

22 

1.64E-12 

23 

4.68E-13 

24 

1.33E-13 

25 

3.79E-14 

26 

1.08E-14 

27 

3.07E-15 

28 

8.73E-16 

8.75E-16 

29 

2.47E-16 

2.49E-16 

30 

6.88E-17 

7.09E-17 

31 

1.81E-17 

2.02E-17 

32 

3.67E-18 

5.74E-18 

33 

-4.35E-19 

1.64E-18 

34 

-1.60E-18 

4.65E-19 

35 

-1.94E-18 

1.33E-19 

36 

-2.03E-18 

3.77E-20 

37 

-2.06E-18 

1.07E-20 

38 

-2.07E-18 

3.06E-21 

39 

-2.07E-18 

8.70E-22 

40 

-2.07E-18 

2.48E-22 

Table  1  Effect  of  Rounding  Errors 

The  3rd  column  in  table  1  shows 
the  effect  if  this  is  subtracted.  The 
process  works  well  until  the 
denominator  in  the  above 
expression  becomes  too  small 
(values  of  s  around  10'26).  In  the 
case  demonstrated  the  IEEE  64-bit 
floating  point  format  was  used  with 
sign  (1  bit),  exponent  (11  bits)  and 
mantissa  (52  bits).  It  is  convenient 
to  incorporate  this  correction  into  a 
normalisation  procedure  which  is 
applied  after  every  iteration;  the 
procedure  can  also  sum  the  queue 
probabilities  to  ensure  that  rounding 
errors  do  not  cause  the  overall  total 
to  deviate  from  unity,  neglecting  the 
very  small  probability  that  the  queue 
will  exceed  the  length  computed. 
Using  this  technique  reduces 
computation  time  when  low 
probability  events  are  of  interest  as 
it  avoids  the  use  of  extended  length 
arithmetic. 


6.  ADDING  STREAM  CONSISTING  OF  AN  ARBITRARY  PATTERN 


S' 


a 

Pi 

pi 


Figure  3  showed  the  transitions 
when  the  background  load  was 
present.  If  an  extra  full  cell  is 
present  in  the  additional  stream 
then  the  transitions  are 
modified  to  those  shown  in 
figure  4.  This  is  obviously  not 
a  stable  situation  as  queue 
length  probabilities  will  increase 
without  limit.  Interspersed 
empty  cells  are  required  in  the 
additional  stream  to  allow  the 
queue  length  to  reduce.  Note 
that  for  simplicity  of 
explanation  we  are  still  working 
with  the  case  of  2  random 

sources  to  form  the  background  load;  the  Poisson  case  works  in  exactly  the  same 
manner  but  of  course  the  probabilities  that  more  than  two  background  cells  arrive  are 


Figure  4  Background+Extra  Cell 
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n+ 1 

non  -  zero  and  so  in  the  case  of  no  extra  full  cell  arriving  s'  n  =  ^Sn  +  1  -  i.pt  and  if 

i=0 

n 

there  is  an  additional  full  cell  s' n  =  ^jsn- ,.p,  assuming  that  an  arriving  cell  to  an 

7=0 

empty  queue  can  be  served  in  the  same  time  slot. 

We  now  have  all  the  components  necessary  to  animate  the  buffer  state  probabilities. 
Starting  from  any  initial  set  of  state  probabilities  the  appropriate  transitions  will  be 
applied  -  in  the  rather  simplified  case  we  have  been  considering  they  will  be  as  shown 
in  figure  4  for  every  occurrence  of  a  full  cell  in  the  additional  stream,  and  figure  3 
when  there  is  no  extra  full  cell  in  the  additional  stream.  We  will  then  end  up  with  a 
new  set  of  state  probabilities.  This  can  then  be  repeated  using  those  final  set  of  state 
probabilities  as  the  new  starting  point.  Eventually  the  final  state  probability  vector  will 
be  the  same  as  the  starting  vector.  How  many  iterations  this  takes  depends  on  how 
near  the  final  answer  is  to  the  starting  condition.  An  obvious  choice  is  to  start  with  the 
fill  probabilities  due  to  the  background  stream  alone  either  derived  iteratively  as  shown 
in  section  3  above,  or  by  an  explicit  solution. 

Taking  the  simple  example  considered  previously  together  with  an  additional  stream  of 
1  full  cell,  2  empty  cells,  2  full  cells  and,  finally,  4  empty  cells,  the  process  is  as  shown 
in  table  2A.  For  clarity  the  animations  of  full  cells  are  shaded.  The  first  column  is  the 
fill  probabilities  due  to  the  background  load  calculated  explicitly.  This  is  then  operated 
on  by  the  transitions  appropriate  to  an  additional  full  cell  arriving.  This  is  followed  by 
the  two  empty  cells  and  so  on.  The  derived  parameter  Z+  is  shown  where  it  is 
calculated  for  the  empty  extra  cell  case. 


Slot — ► 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Add.Patt-* 

Full 

Empty 

Empty 

Full 

Full 

Empty 

Empty 

Empty 

Empty 

Initial  4 

8.89E-01 

5.00E-01 

6.87E-01 

7.75E-01 

4.36E-01 

2.45E-01 

4.44E-01 

5.84E-01 

6.80E-01 

7.45E-01 

9.88E-02 

3.89E-01 

2.33E-01 

1.68E-01 

3.85E-01 

3.80E-01 

2.98E-01 

2.35E-01 

1.92E-01 

1.62E-01 

1.10E-02 

9.88E-02 

6.75E-02 

4.60E-02 

1.37E-01 

2.49E-01 

1.70E-01 

1.19E-01 

8.47E-02 

6.16E-02 

1.22E-03 

1.10E-02 

1.10E-02 

9.02E-03 

3.28E-02 

9.41E-02 

6.48E-02 

4.51E-02 

3.17E-02 

2.25E-02 

1.35E-04 

1.22E-03 

1.22E-03 

1.22E-03 

6.95E-03 

2.48E-02 

1.82E-02 

1.31E-02 

9.48E-03 

6.85E-03 

1.50E-05 

1.35E-04 

1.35E-04 

1.35E-04 

1.10E-03 

5.27E-03 

4.05E-03 

3.08E-03 

2.33E-03 

1.74E-03 

1.67E-06 

1.50E-O5 

1.50E-05 

1.50E-05 

1.35E-04 

9.22E-04 

7.66E-04 

6.22E-04 

4.96E-04 

3.91E-04 

Z*-> 

1.66E-06 

1.67E-06 

1.61E-04 

1.45E-04 

1.25E-04 

1.06E-04 

Table  2  A  Initial  Animation  of  Background  +  Pattern 

The  final  column  indicates  the  fill  probabilities  after  the  pattern  has  passed  through. 
These  probabilities  may  be  substituted  as  a  new  set  of  initial  conditions  and  the  process 
repeated.  Eventually  (after  10  cycles  in  this  case)  the  initial  and  final  states  are  the  same,  as 
shown  in  table  2B. 
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Slot-» 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Add.Patt-> 

Full 

Empty 

Empty 

Full 

Full 

Empty 

Empty 

Empty 

Empty 

Initial  4- 

6.85E-01 

3.86E-01 

5.59E-01 

6.60E-01 

3.71E-01 

2.09E-01 

3.86E-01 

5.18E-01 

6.15E-01 

6.85E-01 

1.68E-01 

3.52E-01 

2.40E-01 

1.89E-01 

3.53E-01 

3.38E-01 

2.78E-01 

2.29E-01 

1.94E-01 

1.68E-01 

7.84E-02 

1.50E-01 

1.13E-01 

8.46E-02 

1.60E-01 

2.45E-01 

1.80E-01 

1.34E-01 

1.02E-01 

7.84E-02 

3.79E-02 

6.12E-02 

4.86E-02 

3.77E-02 

6.47E-02 

1.18E-01 

8.81E-02 

6.62E-02 

4.99E-02 

3.79E-02 

1.73E-02 

2.88E-02 

2.21E-02 

1.70E-02 

2.90E-02 

5.05E-02 

3.89E-02 

2.97E-02 

2.27E-02 

1.73E-02 

7.76E-03 

1.32E-02 

1.01E-02 

7.73E-03 

1.31E-02 

2.23E-02 

1.72E-02 

1.32E-02 

1.01E-02 

7.77E-03 

3.47E-03 

5.94E-03 

4.56E-03 

3.50E-03 

5.93E-03 

1.01E-02 

7.71E-03 

5.92E-03 

4.53E-03 

3.47E-03 

Z+-)- 

2.67E-03 

2.06E-03 

4.54E-03 

3.47E-03 

2.65E-03 

2.03E-03 

Table  2B  Animation  to  Stability  of  Background  +  Pattern 

We  have  now  reached  a  stable  set  of  state  probabilities.  A  more  graphical  illustration 
of  the  process  can  be  seen  in  figure  5.  This  shows  how  the  state  probabilities  vary 
during  the  iterative  process,  starting  from  the  initial  condition  until  stability  is  reached. 

All  that  is  now  necessary  is  to  calculate  the 
mean  probability  for  each  state  by  taking  the 
average  value  of  each  row  in  the  stable  state  in 
table  2B;  of  course  the  initial  and  final 
conditions  should  only  be  included  once.  The 
result  is  shown  in  table  3.  In  order  to  estimate 
the  equivalent  bandwidth  it  is  only  necessary  to 
find  the  random  load  which  approximates  to 
these  buffer  fill  probabilities.  Any  excess  over 
0.5  (the  background  load)  will  be  the  equivalent 
bandwidth. 


State 

Probability 

0 

4.88E-01 

1 

2.60E-01 

2 

1.39E-01 

3 

6.36E-02 

4 

2.85E-02 

5 

1.27E-02 

6 

5.74E-03 

Table  3  Mean  State  Probabilities 

Iterations 

Figure  5  Animation  to  Stability  -  Background  +  Pattern 
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7.  PRACTICAL  RESULTS 


Let  us  have  a  look,  at 
results  from  the  MAP 
mentioned  right  at  the 
beginning  (i.e.  10  full.  25 
empty,  1 1  full,  26  empty, 

1 1  full,  26  empty,  5  full 
and  133  empty  cells). 

Note  that  in  this  case  the 
large  number  of  final  empty 
cells  in  the  pattern  will 
mean  that  the  queue  length 
probabilities  will  have 
returned  to  the  level  of  the 
background  load  and  a 
second  cycle  will  not  be 

needed.  In  fact  with  other  patterns  we  considered  ending  with  several  thousand  empty 
cells,  steps  were  taken  to  discontinue  the  animation  once  the  probabilities  had  reached 
a  stable  state. 

The  tabulation  of  the  resulting  probability  distribution  shows  that  the  buffer  fill  of  2$ 
occurs  with  probability  of  about  l O'8  From  Figure  o  this  is  equivalent  to  a  Poisson 
load  just  over  0.7  More  careful  examination  comes  up  with  a  figure  of  0  '14  Figure 
7  shows  the  queue  length  probability  distribution  due  to  a  Poisson  load  of  0  5  together 
with  the  distribution 
from  the  animation 
with  the  extra  stream. 

It  can  be  seen  that  the 
distribution  due  to  the 
combined  load  is 
approximately  that 
from  a  purely  Poisson 
load.  Thus  it  can  be 
argued  that  the 
additional  pattern 
contributes  a  load 
equivalent  to  a 
Poisson  load  of 
0.714  -  0.5  =  0.214, 

representing  the  Queue  Length 

equivalent  bandwidth  Figure  7  Comparison  of  Probabilities 
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8.  EQUIVALENT  BANDWIDTH:  SENSITIVITY  TO  DERIVATION 

In  denvmg  a  figure  for  the  equivalent  bandwidth  two  plausible  but  arbitrary  parameters 
were  included.  They  were  the  background  level  of  0.5  and  the  comparison  value  of 
10  Inspection  of  figure  7  shows  that  had  a  higher  comparison  value  been  chosen 
then  a  higher  value  for  the  equivalent  bandwidth  would  have  resulted.  A  summary  of 
the  variation  for  different  values  of  the  two  parameters  is  shown  in  Figure  8  using  the 


Figure  8  Equivalent  Bandwidth:  derivation  sensitivity 

MAP  mentioned  above.  It  can  be  seen  that  higher  values  of  the  comparison  value  lead 
to  higher  estimates  of  the  equivalent  bandwidth,  implying  that  the  probability 
distribution  function  continues  to  be  of  the  form  shown  in  figure  7.  At  higher  values  of 
the  background  load  the  apparent  equivalent  bandwidth  falls.  This  is  inevitable  as,  in 
the  limit,  queues  become  very  large,  virtually  all  cells  are  full,  and  the  details  of  the 
distribution  are  of  little  consequence.  As  the  combined  load  must  be  less  than  1  and 
the  pattern  in  question  contributes  0.1  the  trend  is  indicated  by  the  dotted  lines 
extending  to  the  right.  At  low  background  levels  the  queuing  tends  to  that  due  to  the 
MAP  alone  and,  as  this  is  a  single  source,  there  will  be  no  queuing  and  hence  zero 
equivalent  bandwidth.  It  can  be  seen  that  to  ascribe  an  exact  figure  to  the  equivalent 
bandwidth  derived  by  these  means  is  not  possible.  However  working  in  the  chosen 
region  results  in  a  figure  that  gives  a  good  indication  of  the  impact  of  the  pattern  on 
the  network  buffers. 


9.  PATTERN  CELL  POSITION  PROBABILITY  DISTRIBUTION 

Let  us  once  again  revert  to  the  simple  example  shown  in  Table  2B.  The  columns 
indicate  the  queue  length  probabilities  but  there  is  no  indication  where  in  the  queue  the 
pattern  cell  might  be.  If  for  some  reason  the  pattern  cell  is  always  served  after  the 
background  cells  then  the  first  pattern  cell  will  have  to  wait  for  an  extra  time  slot 
before  it  is  served  with  a  probability  (from  Table  2B)  of  0.352,  and  for  2  extra  slots 
with  probability  0. 1 50  and  so  on.  Similarly  the  other  pattern  cells  will  be  delayed  in  the 
same  way.  There  was  always  a  probability  that  there  was  no  background  cell  and  the 
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pattern  cell  was  the  only  one  in  the  queue  in  which  case  it  would  be  served 
immediately;  this  arises  with  probability  0.386.  The  whole  process  can  be  tabulated  as 
shown  in  Table  3. 


Slot  = 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1st  cell 

3  86E-01 

3.52E-01 

1.50E-01 

6.12E-02 

2.88E-02 

1.32E-02 

5.94E-03 

0 

0 

2nd  cell 

5  93E-03 

0 

0 

3.7IE-01 

3.53E-01 

1.60E-01 

6.47E-02 

2.90E-02 

1.31E-02 

3rd  cell 

2.23E-02 

1.01E-02 

0 

0 

2.09E-01 

3.38E-01 

2.45E-01 

1.18E-01 

5.05E-02 

Total 

4.14E-01 

3.62E-01 

1.50E-01 

4.32E-01 

5.91E-01 

5.11E-01 

3.16E-01 

1.47E-01 

6.36E-02 

Table  3  Cell  Delay  Distnbution  -  Pattern  cell  served  last 


The  table  shows  that  the  original  pattern  cell  that  had  a  probability  of  1  of  being  in  a 
particular  slot  may  now  occur  in  one  of  several  slots  with  appropriate  probabilities. 
The  shading  indicates  the  position  of  the  original  pattern  cells.  This  is  a  sort  of 
convolution  but  the  delay  probability  distribution  varies  from  cell  to  cell.  As  the 
sequence  is  repetitive  probabilities  wrap  round  to  the  start.  The  bottom  row  shows  the 
probability  of  there  being  a  pattern  cell  in  any  slot.  Figure  9  illustrates  the  effect 
graphically;  the  clear  columns  are  the  original  pattern  cell  positions  and  the  shaded  area 
represents  the  pattern  cell  position  probability.  The  different  delay  variations  for  the 
different  cells  can  clearly  be  seen 
It  may  be  argued  that  the 
assumption  that  the  pattern 
cells  are  served  last  is 
unrealistic.  The  converse 
assumption  that  the  pattern  cell 
is  served  before  any 
background  cells  arriving  in 
that  slot  is  also  easy  to 
evaluate.  In  that  case  the  first 
pattern  cell  will  be  confronted 
by  the  queue  length  probability 
distribution  due  to  the  cells 
that  have  arrived  previously. 

Thus  the  first  pattern  cell  will 
not  be  delayed  with  probability 
0.685,  be  delayed  by  one  slot  with  probability  0.168  and  so  on.  Table  4  gives  the 
corresponding  figures. 


Figure  9  Cell  Probabilities 


Slot= 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1st  cell 

6  85E-01 

1.68E-01 

7.84E-02 

3.79E-02 

1.73E-02 

7.76E-03 

3.47E-03 

0 

0 

2nd  cell 

3  50E-03 

0 

0 

6.60E-01 

1.89E-01 

8.46E-02 

3.77E-02 

1.70E-02 

7.73E-03 

3rd  cell 

1.31E-02 

5.93E-03 

0 

0 

3.71E-01 

3.53E-01 

1.60E-01 

6.47E-02 

2.90E-02 

Total 

7  01E-01 

1.73E-01 

7.84E-02 

6.98E-0 1 

5.77E-01 

4.45E-01 

2.02E-01 

8. 17E-02 

3.67E-02 

Table  4  Cell  Delay  Distnbution  -  Pattern  cell  served  first 
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The  more  realistic  case  of  the  pattern  cell  being  served  at  random  amongst  the 
background  cells  is  more  complex.  With  N  background  cells  arriving  there  are  N+l 
places  that  a  pattern  cell  might  find  itself  and  all  are  equally  likely.  Hence  if  N 
background  cells  arrive  with  probability  Pn,  then  the  probability  that  exactly  n 
background  cells  precede  a  pattern  cell  is  pN/(N+l)  0<n<N,  Hence  the  total 
probability  of  exactly  n  background  cells  before  the  pattern  cell,  bn,  is  found  by 

summing  for  all  N.  Hence  PN 

N+l 

N=n 

Taking  the  simple  case  we  have  been  pursuing,  and  the  figures  from  section  3: 

bo  =  p0  +  p,/2  +  p2/3  =  0.7708 
bi  =  pi/2  +  p2/3  =  0.2083 
b2  =  p2/3  =  0.0208 


These  probabilities  may  then  be  applied  in  the  manner  of  Figure  4  to  the  queuing 
probabilities  before  the  Pattern  Cell  arrival  to  give  the  queue  length  as  seen  by  the 
arriving  pattern  cell.  The  results  are  shown  in  Table  5. 


initial  -► 

Slot  3 

Slot  4 

6.85E-01 

5.28E-01 

6.60E-01 

5.09E-01 

3.71E-01 

2.86E-01 

1.68E-01 

2  72E-01 

1.89E-01 

2.83E-01 

3.53E-01 

3.49E-01 

7.84E-02 

1.10E-01 

8.46E-02 

1.18E-01 

1.60E-01 

2.05E-01 

3.79E-02 

4.90E-02 

3.77E-02 

5.06E-02 

6.47E-02 

9.05E-02 

1.73E-02 

2.29E-02 

1.70E-02 

2.27E-02 

2.90E-02 

3.92E-02 

7.76E-03 

1.04E-02 

[7/73E-03 

1.03E-02 

1.31E-02 

1.75E-02 

3.47E-03 

4.65E-03 

3.50E-03 

4.66E-03 

5.93E-03 

7.90E-03 

Table  5  Figures  of  table  2B  Modified  to  Case  where  Pattern  Cell  is  Queued  at  1 

Random 

These  may  be  rearranged  in  the  same  manners  as  Tables  3  and  4  to  get  the  Cell  delay 
distributions.  Figure  10 
compares  the  results  of 
the  3  environments.  The 
tall  columns  show  the 
original  positions  of  the  3 
cells.  The  other  columns 
show  the  Pattern  Cell 
position  probabilities  for 
the  first  in  the  queue,  last 
in  the  queue  and  random 
position  environment. 

Not  surprisingly  the 
random  environment 

produces  a  result  between 
the  first  and  the  last  in 
magnitude. 


Figure  10  Comparison  of  Effect  of  Pattern  Cell  Priority 
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10.  EQUIVALENT  BANDWIDTH  OF  DISTRIBUTED  CELL 
PROBABILITIES 


The  technique  described  in  section  6  above  assumes  that  there  either  ‘is’  or  ‘is  not’  a 
cell  present.  Extending  this  to  the  environment  where  a  pattern  cell  is  present  with  a 
given  probability  presents  no  difficulty. 

Let  c  be  the  probability  of  a  pattern  cell.  If  the  probability  of  n  background  cells  is  p„ 
then  the  probability  of  n  cells  from  the  pattern  and  the  background  together, 
pcn  =  Pn  *  (1  -  c)  +  pn_,  *  c.  It  is  found  easiest  to  separate  the  queuing  into  two 
processes  -  arrival  and  serving.  Given  a  vector  representing  the  first  N  +  1  terms 
(so  -»  sN)  of  the  queue  length  probability  distribution,  the  first  process  is  to  represent 

the  arrival  of  cells  in  the  slot:  $'n  =  ^  5.  pc 

1=0 

To  avoid  the  problem  of  rounding  errors  the  correction  process  described  in  section  5 
may  be  applied  at  this  point. 

The  next  process  is  to  represent  the  serving  of  the  queue.  That  is  to  say  s''o  =  s'o  + 
s'i;  for  n  =  1  ->  N-l  s"n  =  s'n+i;  and  using  the  result  in  section  4,  s''n  =  s'n2/s'n-i- 
This  new  vector  S"  represents  the  new  queue  length  probability  distribution,  S. 
Derivation  of  mean  buffer  fill  and  the  equivalent  bandwidth  may  then  proceed  exactly 
as  described  in  section  6.  A  new  cell  position  probability  distribution  and  equivalent 
bandwidth  may  also  be  calculated.  If  cm  was  the  relevant  cell  probability  in  slot  m,  with 
a  total  number  of  slots  M  (0  -»  M-l)  then  the  new  probability  distribution  is  found: 


N 

TSjC 

7=0 


(m-j) 


* 


The  *  attached  to  the  (m-j)  term  is  to  indicate  that  if  the  term  becomes  negative  then  M 
should  be  added  to  it  to  take  into  account  the  wrapping  effect  of  the  repetitive  pattern. 
If  Sj  is  used  then  it  assumes  the  pattern  cell  is  last  in  the  queue;  s'j  can  be  used  if  the 


first  in  the  queue  result  is  required. 


Figure  1 1  Queuing  with  Distributed  Cell  Probabilities 


If  the  random  case  is  required  then  a  special 

version  of  s  must 
be  calculated  after 
F  is  calculated  as 
discussed  in 

section  8. 

Remembering  that 
s  is  different  for 
each  m  in  practical 
calculations  it  is 
simplest  to  follow 
the  practice  of 
Figure  11, 

accumulating  the 
new  cell  position 
probabilities  for 
every  value  of  m. 
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Background  In 


-Ltr-TLLrTLLr-,  L_r-.  "S* 


Background  Out 


Figure  12  Multiple  Queuing  Stages 


All  results  quoted 
here  are  based  on 
the  ‘last  in  the 
queue’  result. 

However  the  ‘first 
in  the  queue’ 
results  are  similar 
except  that  passage 
through  a  queue 

has  less  effect  due  to  the  smaller  position  probability  spreading.  By  repeatedly 
applying  the  whole  process  the  cell  distribution  may  be  calculated  as  the  pattern  of 
cells  passes  through  a  series  of  queues  as  shown  in  Figure  12.  From  simulation,  and 
also  intuitively,  it  is  known  that  the  cells  will  be  distributed  about  their  initial  starting 
position. 

One  interesting  result  is  that  if  the  cells  start  fairly  clumped  initially  their  equivalent 
bandwidth  will  not  decrease  until  they  have  been  through  many  queues.  In  fact  taking 

the  MAP  mentioned  several  times 
already  the  effect  of  passing 
through  the  queue  is  initially  to 
increase  the  equivalent  bandwidth. 
Several  adjacent  slots  with 
fractional  probabilities  of  being 
occupied  by  a  cell  have  greater 
bandwidth  than  an  occupied  cell 
surrounded  by  empty  slots  during 
which  extra  cells  may  be  served. 
The  initial  effect  of  the  series  of 
queues  is  to  fill  in  the  short  gaps 
between  the  short  bursts  and  this 
increases  the  equivalent  bandwidth  With  more  passages  through  queues  the  now 
smoothed  burst  of  bursts  begins  to  spread  giving  a  reduction  in  equivalent  bandwidth. 
This  qualitative  description  is  quantified  in  figure  13.  The  equivalent  bandwidth  has 


Comparison 
Value  = 

10-8  10-4 


/ 


Stages  of  Queuing 


Figure  13  Effect  of  Several  Switching  Stages 
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Figure  14  Effect  of  Bursts  Passing  Through  Several  Switching  Stages 
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been  calculated  against  a  Poisson  background  of  0.5  and  comparison  value  of  10  and 
10'8.  It  is  interesting  that  in  this  case  there  is  not  much  difference  between  the  two 
comparison  values;  presumably  the  pattern  is  “random  in  nature’  and  hence  does  not 
distort  the  probability  distribution  due  to  the  background.  Figure  14  also  shows  the 
effect  of  the  passage  of  a  bursts  of  cells,  passing  through  the  series  of  queues.  In  all 
cases  the  mean  rate  is  0.1  but  the  cells  are  in  bursts  of  1,  2,  4,  8  and  16.  The 
calculation  of  the  equivalent  bandwidth  has  been  done  at  two  comparison  values  -  10 
and  10’8.  It  can  be  seen  that  the  fall  off  in  the  equivalent  bandwidth  with  passage 
through  several  queues  is  less  marked  at  the  lower  probability  level  and  that  the 
difference  between  the  two  comparison  values  is  much  more  marked. 


11.  CONCLUSIONS 

It  is  relatively  easy  to  derive  the  equivalent  bandwidth  of  a  stream  consisting  of  an 
arbitrary  sequence  of  full  and  empty  cells  using  the  technique  of  animation  described 
above.  This  can  be  extended  to  the  impact  of  transit  through  several  switching  stages. 
The  figure  for  equivalent  bandwidth  is  not  an  absolute  figure  but  gives  a  guide  to  the 
impact  of  a  data  stream.  The  value  of  the  equivalent  bandwidth,  and  hence  the  impact, 
of  the  stream  is  fairly  independent  of  the  background  parameters  and  the  passage  of  the 
stream  through  the  network.  Computational  requirements  are  well  within  the 
resources  of  a  Personal  Computer,  with  computation  times  of  seconds  or  minutes. 
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Abstract 

The  introduction  of  multiple  bearer  services  with  different  delay  characteristics  is  pro¬ 
posed.  In  this  context  statistical  multiplexing  can  be  exploited  to  such  an  extent  that 
full  loading  of  transmission  lines  is  feasible  without  cell  losses.  Strict  Usage  /  Network 
Parameter  Control,  based  on  the  Generic  Cell  Rate  Algorithm,  is  needed.  Connection  Ad¬ 
mission  Control  can  be  decided  by  means  of  simple  arithmetic  rules.  A  switch  architecture 
operating  with  multiple  QoS  classes  is  designed.  Simulation  results  are  presented. 

Keywords 

Bandwidth  Allocation,  Connection  Admission  Control,  Usage  Parameter  Control,  Net¬ 
work  Parameter  Control,  Generic  Cell  Rate  Algorithm,  ATM  switch,  ATM  traffic  simu¬ 
lation. 


1  INTRODUCTION 

In  narrowband  ISDN  12  different  bearer  services  have  been  defined  (ITU-T,  1.200  series); 
see  e.g.  (Stallings,  1990),  section  6.  According  to  original  plans,  the  introduction  of  broad¬ 
band  ISDN  would  have  lead  to  a  further  increase  of  this  number.  The  complex  situation 
that  would  have  resulted  was  avoided  by  the  adoption  of  Asynchronous  Transfer  Mode 
(ATM)  which  is  based  on  a  single  bearer  service,  namely  cell  relay.  See  e.g.  (Handel  et 
al,  1994),  chapter  2,  or  (Minoli  et  al,  1994),  chapter  5.  Since  then,  arguments  in  favour  of 
multiple  bearer  services  have  been  formulated  (Kroner  et  al,  1991).  Unlike  the  situation  in 
narrowband  ISDN  these  multiple  bearer  services  would  differ  only  in  guaranteed  Quality 
of  Service  (QoS)  (ITU-T,  1.356). 

Assigning  a  higher  service  priority  to  real-time  traffic  (such  as  voice)  over  non-real  traffic 
(such  as  data)  has  been  proposed  at  several  occasions.  See  the  introduction  of  (Lee  et  al, 
1993)  and  references  quoted  there.  As  pointed  out  in  (Kroner  et  al,  1991),  introduction  of 
priorities  is  not  consistent  with  the  idea  of  the  single  bearer  service.  At  least  two  bearer 
services  should  be  offered  to  the  subscribers,  one  with  high  quality  of  service,  one  with 
medium  quality.  They  should  be  offered  either  at  call  or  at  cell  level.  The  cell  loss  priority 
bit  in  the  ATM  cell  header  offers  the  possibility  to  introduce  two  service  classes  at  cell 
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level.  This  track  has  been  explored  by  many  authors.  The  present  paper  introduces  a 
multiplexing  scheme  with  multiple  bearer  services  at  call  level.  As  a  side  effect  of  the 
proposal,  some  basic  problems  of  ATM  technology  (statistical  multiplexing,  queueing  in 
switches  and  multiplexers,  connection  admission  control,  ...)  can  be  solved  in  an  elegant 
manner. 

Overload  of  a  connectionless  network  leads  to  degradation  of  service  for  all  users.  In 
connection  oriented  networks  the  setup  of  new  connections  is  refused  when  this  would 
lead  to  congestion.  As  a  consequence,  quality  of  service  can  be  guaranteed  to  all  users.  Of 
course,  this  requires  an  accurate  knowledge  of  the  conditions  leading  to  congestion.  Nowa¬ 
days  there  is  a  strong  tendancy  to  relax  strict  resource  management  and  to  replace  it  by 
self-regulating  mechanisms  like  discarding  cells  in  case  of  buffer  overflow,  flags  indicating 
congestion  conditions,  traffic  regulating  tokens,  and  so  on.  The  alternative  followed  here 
is  a  deterministic  network  service  (Knightly  et  al,  1995)  in  which  cells  are  never  lost  and 
QoS  is  guaranteed  in  a  deterministic  way.  The  effect  of  relaxing  conditions,  introducing 
dynamic  traffic  control  mechanisms,  can  then  be  studied  later  on  as  a  small  perturbation 
to  a  stable  and  well-balanced  system. 

The  starting  point  of  the  present  paper  is  the  following  observation.  In  the  presence  of 
nothing  but  Constant  Bit  Rate  (CBR)  sources  the  objectives  of 

Ol  full  load  of  transmission  lines 

02  no  cell  losses 

03  limited  cell  delays  and  low  cell  delay  jitter 


can  easily  be  met  by  partitioning  the  available  bandwidth  over  all  sources.  The  addition  of 
Variable  Bit  Rate  (VBR)  sources  creates  the  dilemma  of  giving  up  either  01  or  02.  Either 
the  sum  of  all  peak  rates  should  add  up  to  atmost  the  total  bandwidth  with,  consequently, 
a  far  from  optimal  line  load,  or,  statistical  multiplexing  is  invoked  to  average  out  the 
bursts,  resulting  in  a  better  use  of  the  available  bandwidth  and  occasional  cell  losses. 
The  situation  studied  here  adheres  to  the  first  option  (peak  cell  rate  allocation)  but  tries 
to  make  better  use  of  the  bandwidth  by  filling  up  holes  in  the  traffic  with  low  priority 
cells.  For  this  purpose  multiple  traffic  streams  with  clearly  different  QoS  requirements  are 
needed.  In  summary,  instead  of  giving  up  objective  01  or  02,  objective  03  is  not  met  for 
at  least  part  of  the  traffic  (the  low  priority  part). 

In  this  scheme  it  is  essential  that  the  traffic  with  high  priority  and  small  delays  is  of 
the  VBR  type  while  the  traffic  used  to  fill  up  the  holes  has  constant  bandwidth  and 
suffers  from  rather  long  delays.  CBR  traffic  with  high  priority  is  still  feasible.  However, 
it  does  not  lead  to  any  opportunity  of  using  low  priority  traffic  to  fill  up  the  capacity  of 
transmission  lines.  The  Available  Bit  Rate  (ABR)  service  class  enters  the  scheme  as  a  low 
priority  alternative  to  the  CBR  service.  End-to-end  flow  control  is  used  to  omit  the  large 
cell  buffers  which  would  otherwise  be  required  at  intermediate  nodes. 

The  section  on  GCRA,  shaping,  and  bursts  is  used  to  fix  notations  and  conventions. 
Next  the  multiplexing  scheme  is  introduced  and  rules  for  resource  management  are  dis¬ 
cussed.  Priority  classes  can  be  organised  by  cascading  several  multiplexers.  Their  use  is 
clarified  by  means  of  an  example.  In  section  5  the  architecture  of  a  switch  which  imple¬ 
ments  priority  classes  is  described.  Simulations  results  confirm  the  viability  of  the  scheme. 
In  a  final  section  connection  admission  control  is  discussed,  some  considerations  are  made 
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about  cost  effectiveness  of  the  scheme,  and  possible  ways  of  pricing  different  services  are 
considered. 


2  BURSTY  CELL  STREAMS 

Strict  resource  management  requires  a  strict  enforcement  of  traffic  contracts  (ITU-T, 
1.371;  ATM  Forum,  1993).  Traffic  contract  conformance  is  specified  by  means  of  the 
Generic  Cell  Rate  Algorithm  (GCRA).  A  cell  stream  is  said  to  satisfy  GCRA  with  cell 
rate  r  and  tolerance  r  if  the  arrival  time  tn  of  the  n-th  cell  is  not  less  than  the  theoretical 
arrival  time  Tn  minus  de  tolerance  r.  The  theoretical  arrival  time  Tn  equals  the  maximum 
of  Tn- 1  and  tn_i  incremented  with  the  inverse  1/r  of  the  cell  rate.  In  formulas: 

tn>Tn-T 

Tn  =  max{Tn_i,<n-i}  + 

with  for  the  first  cell  n  =  0,  T0  =  t0.  The  above  version  of  GCRA  is  called  the  vir¬ 
tual  scheduling  algorithm  (an  equivalent  algorithm  is  the  continuous-state  leaky  bucket 

algorithm).  .  Arrw 

Shaping  of  cell  streams  is  needed  for  three  reasons.  First,  the  user  can  shape  its  ATM 
source  to  assure  conformance  to  the  traffic  contract.  Both  the  network  and  the  user  need 
shaping  to  remove  unwanted  burstiness  added  by  the  network.  Finally,  in  the  estimates 
about  queue  lengths  an  argument  involving  the  maximal  length  of  shaping  queues  will  be 
used. 

Consider  a  cell  stream  which  satisfies  GCRA  with  parameters  r  and  r.  By  means  of  a 
queueing  buffer  the  tolerance  r  of  the  cell  stream  can  be  reduced  to  a  smaller  value  r  .  The 
maximal  number  of  elements  in  the  queue  is  approximately  r(r  -  r').  The  maximal  delay 
of  a  cell  due  to  buffering  is  approximately  r  —  r'  (both  estimates  are  only  approximate 
due  to  the  discrete  nature  of  the  cell  stream). 

Bursty  cell  streams  are  characterised  by  specifying  two  sets  of  parameters  for  which 
they  satisfy  GCRA  (ATM  Forum,  1993).  The  first  set  is  denoted  (rp,Tp).  rp  is  called  the 
Peak  Cell  Rate  (PCR),  rP  the  Cell  Delay  Variation  (CDV)  tolerance.  The  other  set  is 
denoted  (rs,rs).  rs  is  called  the  Sustained  Cell  Rate  (SCR),  ts  the  Burst  Tolerance  (BT). 
One  has  rp  >  rs,  rp  <  t„  and  r3rs  >  1.  Throughout  the  paper,  when  not  specified,  the 
CDV  tolerance  rp  equals  the  inverse  1  /R  of  the  cell  rate  R  of  the  transmission  medium. 
Hence,  a  bursty  cell  stream  is  specified  by  3  parameters:  rp,  rs,  and  ts.  In  what  follows  it 
will  be  called  a  cell  stream  with  Variable  Bit  Rate  (VBR).  If  rs  and  rs  are  not  specified 
(e.g.  because  the  cell  stream  is  not  bursty)  then  rs  =  rp  and  t3  =  rp  are  assumed.  In  this 
case  it  will  be  called  a  cell  stream  with  Constant  Bit  Rate  (CBR)  although  we  do  not 
require  that  the  peak  cell  rate  rp  equals  the  average  cell  rate.  Hence,  in  reality  the  cell 
rate  could  be  far  from  constant. 

From  the  estimates  quoted  above  follows  that  a  VBR  cell  stream  with  parameters  rp,  rs, 
and  rs,  can  be  transformed  into  a  CBR  cell  stream  with  parameter  r'p  equal  to  rs  using  a 
queueing  buffer  of  length  approximately  rsTs.  The  BT  ts  is  often  expressed  in  numbers  of 
cells  instead  of  in  seconds.  Then  the  value  rsrs  is  meant  and  corresponds  (approximately) 
to  the  length  of  the  buffer  needed  to  transform  the  cell  stream  into  a  CBR  stream. 


(1) 

(2) 
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Figure  1  Basic  multiplexer  scheme. 


3  MULTIPLEXER 

The  multiplexer  has  a  number  of  identical  inputs  numbered  from  1  to  N  and  one  low 
priority  input  numbered  0.  The  high  priority  inputs  are  policed  to  enforce  Usage  Parameter 
Control.  See  Figure  1. 

The  load  of  the  high  priority  inputs  will  be  dimensioned  in  such  a  way  that  at  most  one 
conforming  cell  is  waiting  in  each  of  the  N  input  buffers.  The  low  priority  input  buffer 
is  served  when  no  high  priority  cells  are  present.  The  strict  policing  on  the  high  priority 
inputs  together  with  a  correct  dimensioning  of  their  usage  parameters  guarantees  that 
the  low  priority  input  receives  a  specified  bandwidth  with  an  upper  bound  for  the  delay 
of  its  cells.  Note  that  the  multiplexer  can  use  a  simple  round  robin  algorithm  to  serve  the 
non-empty  high  priority  queues,  although  some  form  of  weighted  queueing  is  needed  to 
limit  the  depth  of  the  input  buffers  to  only  one  cell. 

Let  R  denote  the  cell  rate  of  the  transmission  line.  Let  rp(0),  •  •  •  rp(N)  denote  the  peak 
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cell  rates  on  each  of  the  inputs.  The  assumption  that  at  most  one  conforming  cell  is 
waiting  for  transmission  is  fulfilled  by  requiring  that 

X>p(n)<*.  (3) 

n=l 

In  practice,  the  inequality  is  not  very  strict  and  can  be  relaxed  somewhat.  But  then  it 
will  happen  occasionally  that  the  different  high  priority  inputs  hinder  each  other,  and,  in 
this  way,  acquire  extra  time  delays.  The  study  of  this  situation  is  out  of  the  scope  of  the 
present  paper. 

The  low  priority  cell  stream  is  used  to  fill  the  holes  in  the  (bursty)  high  priority  traffic. 
This  leads  to  the  second  requirement 

rp(0)  +  rs(n)  -  •R’ 

n=l 

where  rs(n)  denotes  the  sustained  cell  rates  of  the  n-th  input.  The  buffer  on  input  0  stores 
low  priority  cells  during  bursts  of  the  high  priority  input  channels.  If  all  inputs  would  be 
served  on  equal  basis  then  on  each  input  a  buffer  of  a  certain  size  would  be  needed  to 
absorb  the  burstiness  of  that  input.  Instead  all  these  buffers  are  brought  together  as  one 
large  buffer  on  the  low  priority  input.  This  is  the  essential  argument  used  to  estimate  the 
size  of  the  buffer  on  the  low  priority  input. 

If  no  cells  may  go  lost  then  it  is  clear  that  conditions  (3,  4)  should  be  satisfied.  They  still 
allow  full  loading  of  the  transmission  line.  Additional  constraints  are  needed  to  control 
delays  and  buffer  sizes.  Let  rs(l),  •  •  • ,  ts(N)  denote  the  burst  tolerances  of  the  respective 
inputs.  Then  one  can  show  that  the  length  of  the  queue  of  low  priority  cells  is  never  larger 
than 

Ts{n)r  s(n) ,  (5) 

n— 1 


which  is  the  total  amount  of  burst  tolerance  parameters  BT  at  high  priority  when  ex¬ 
pressed  in  numbers  of  cells.  Hence  the  delay  of  a  low  priority  cell  is  never  larger  than 


W)tc[n)r-{n)- 


(6) 


Sketch  of  proof.  Consider  two  systems.  Both  receive  exactly  the  same  incoming  cell  streams 
characterised  by  the  parameters  (rp(n),  rs(n),  r3(n)),  n  =  0,  •  •  •  N.  Without  restriction, 
assume  that  the  low  priority  source  is  CBR.  In  system  I  the  high  priority  cell  streams 
are  first  shaped  into  CBR  cell  streams  with  cell  rates  equal  to  rs(n).  As  a  consequence, 
only  CBR  sources  arrive  at  the  multiplexer.  Because  of  condition  (4)  the  total  cell  rate 
of  all  sources  together  is  not  larger  than  the  available  bandwidth.  Hence,  the  traffic  can 
be  multiplexed  without  cell  losses  and  with  single  cell  buffers  at  every  input.  System  II 
is  the  priority  system  described  in  the  present  paper.  The  service  disciplines  of  the  two 
systems  can  be  coupled.  This  is  done  as  follows. 
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1  System  I  uses  a  weighted  queueing  discipline. 

2  If  high  priority  input  n  of  system  I  is  served  then  also  high  priority  input  n  of  system 
II  is  served. 

3  If  low  priority  input  0  of  system  I  is  served  then  an  arbitrary  high  priority  input  of 
system  II  is  served,  at  least  if  one  can  be  found  which  has  a  non-empty  queue.  Only  if 
none  is  found  then  also  the  low  priority  input  of  system  II  is  served. 

4  If  in  system  I  no  cell  is  ready  for  transmission  then  in  system  II  an  arbitrary  high 
priority  input  is  served,  if  one  can  be  found  which  has  a  non-empty  queue. 

Clearly,  system  II  has  a  better  throughput  than  system  I,  because  of  rule  4.  Hence  it 
needs  at  most  the  same  amount  of  buffering  as  system  I.  Assume  that  in  system  II  a  high 
priority  cell  is  served  while  in  system  I  the  low  priority  input  is  served.  Then  the  arriving 
cell  causes  an  increase  in  length  of  the  shaper  queue  at  the  high  priority  input  of  system 
I  while  in  system  II  the  queue  of  the  low  priority  input  increases  by  one  relative  to  the 
same  queue  of  system  I.  This  shows  that  under  rule  3  there  is  a  one-to-one  coupling  of 
the  queue  lengths  of  the  shapers  in  system  I  and  of  the  low  priority  input  in  sytem  II. 
Under  rule  4  system  II  becomes  more  efficient  than  system  I.  Both  rules  together  imply 
inequality  5.  □ 

Note  that  (6)  implies  that  the  delay  is  bounded  above  by 

R  rp(Q)  max{T»(1).  rt(N)}.  (7) 

Several  multiplexers  may  be  cascaded  by  connecting  the  output  of  one  multiplexer  to 
the  low  priority  input  of  the  next.  See  Figure  2.  In  this  way  inputs  are  divided  into 
classes  of  different  priority  and  correspondingly,  different  maximal  burst  tolerances  and 
maximal  cell  delay  jitter.  Roughly,  the  maximal  delay  jitter  in  one  class  of  inputs  equals 
the  maximal  burst  tolerance  of  the  class  which  has  one  level  higher  priority,  multiplied 
with  the  ratio  of  SCR  at  high  priority  /  PCR  at  low  priority  (both  at  the  higher  level  — 
see  eq.  7).  The  amount  of  traffic  in  one  class  of  inputs  determines  the  allowable  difference 
between  peak  cell  rates  and  sustained  cell  rates  in  the  class  of  higher  priority. 


4  EXAMPLES 

As  an  example  let  us  consider  a  transmission  line  with  a  cell  rate  of  R  =  353, 000  cells/sec. 
Four  service  classes  are  provided  with  characteristics  as  found  in  Table  1. 

Extreme  priority  is  reserved  for  a  limited  number  of  connections  with  low  SCR,  e.g. 
rs  =  20  cells/sec,  but  high  PCR  (1,000  or  more).  With  such  a  connection  about  ten 
cells  can  be  issued  at  PCR  with  the  knowledge  that  they  will  receive  absolute  priority 
throughout  the  network.  Urgent  messages  have  interesting  applications.  Service  messages 
such  as  flow  control  messages  for  ABR  service,  call  setup  cells,  and  routing  information, 
need  this  kind  of  treatment.  But  also  user  applications,  e.g.  in  a  client-server  context, 
can  profit  from  urgent  short  messages.  The  maximal  burst  tolerance  of  the  high  priority 
class  has  been  choosen  in  function  of  video  conferencing.  E.g.,  with  usage  parameters 
rp  =  20,000  cells/sec,  r3  —  8,000  cells/sec,  and  rs  =  0.2  sec,  a  burst  at  peak  rate  consists 
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Figure  2  Cascading  multiplexers. 

Table  1  Example  of  service  classes  (times  in  sec) 


priority 

cell  delay 
jitter  (sec) 

maximal  burst 
tolerance  (sec) 

(E) 

extreme 

0.000  5 

0.4 

(H) 

high 

0.005 

0.2 

(M) 

medium 

0.3 

10 

(L) 

low 

5 

— 

of  2,666  cells  and  takes  0.133  sec.  The  maximal  cell  delay  jitter  of  5  msec  is  about  what  is 
acceptable  for  phone  calls.  In  the  class  with  medium  priority  the  maximal  burst  tolerance 
is  limited  to  10  sec.  An  example  of  bursty  traffic  which  needs  this  kind  of  burst  tolerance 
is  interconnection  of  Local  Area  Networks  (LAN’s).  In  principle,  the  low  priority  class 
should  carry  CBR  traffic  because  nothing  can  be  gained  by  allowing  VBR.  However,  this 
service  class  is  well  suited  for  organising  a  low  cost  ABR  service.  It  has  a  guaranteed 
bandwidth  equal  to  part  of  the  bandwidth  not  allocated  for  services  of  higher  priority.  It 
has  also  a  guaranteed  worst  case  delay  jitter. 
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Table  2  Example  of  line  load 

priority  number  of  usage  service  comments 

level  connections  parameters 

extreme  100  (3  000,  20,  0.4)  VBR  control  lines 

available  bandwidth  R0  =  353, 107  cells/sec 
sum  of  PCR’s:  300,000  cells/sec  (53,107  not  used) 
sum  of  SCR’s:  2,000  cells/sec  (remains  351,107  cells/sec) 

h'gh  1  (79  650)  CBR  450  phone  calls 

8  (20  000,  8  000,  0.2)  VBR  real  time  video  channels 

22  (5  000,  2  000,  0.2)  VBR  video  conference  calls 

available  bandwidth  Ri  =  351, 107  cells/sec 
sum  of  PCR’s:  349,650  cells/sec  (1,457  not  used) 
sum  of  SCR’s:  187,650  cells/sec  (remains  163,457  cells/sec) 

medium  1  (50  000)  CBR  10  virtual  leased  lines 

4  (25  000,  5  000,  10)  VBR  network  interconnects 

available  bandwidth  R2  =  163,457  cells/sec 
sum  of  PCR’s:  150,000  cells/sec  (13,457  not  used) 
sum  of  SCR’s:  70,000  cells/sec  (remains  93,457  cells/sec) 

l°w  1  (88  000)  CBR  data  channel 

1  (2  000)  CBR  test  channel 

available  bandwidth  R3  =  93,457  cells/sec 
sum  of  PCR’s:  90,000  cells/sec  (3,457  not  used) 


The  multiplexer  requires  two  small  and  two  large  buffers.  The  queue  for  multiplexing 
high  priority  with  extreme  priority  can  be  kept  small,  of  the  order  of  1,000  cells,  by 
limiting  the  total  bandwidth  assigned  to  connections  with  extreme  priority  and  VBR 
service.  The  queue  for  multiplexing  the  medium  priority  inputs  with  the  high  priority 
cells  contains  of  the  order  of  100,000  cells.  The  queue  for  the  low  priority  traffic  can 
become  much  larger.  Suppose  that  the  amount  of  medium  priority  traffic  is  limited  to 
100,000  cells/sec  sustained.  Even  then  the  length  of  the  queue  can  increase  to  1  million 
cells.  However,  because  of  the  involved  delay  times  it  can  be  implemented  using  mass 
memory.  Alternatively,  if  flow  control  is  used  for  the  medium  and  low  priority  classes 
then  relatively  small  buffers  can  suffice. 

Table  2  gives  a  snapshot  of  a  possible  loading  of  the  transmission  line.  In  the  table, 
CBR  sources  are  characterised  by  a  single  cell  rate,  VBR  sources  by  a  triple  (PCR,  SCR, 
BT). 

A  second  example  multiplexing  16  inputs  is  given  in  Table  3.  It  is  much  less  balanced 
than  the  previous  example.  There  is  important  traffic  at  extreme  priority  which  causes 
long  delays  for  all  traffic  of  lower  priority.  One  cannot  expect  this  traffic  to  pass  multiple 
switches  while  still  meeting  the  goals  of  Table  1.  Still,  simulation  results  reported  below 
show  that  the  behaviour  of  the  network  remains  predictable.  Note  that  one  of  the  low 
priority  channels  is  VBR  instead  of  CBR  to  reduce  the  nominal  load  from  100%  to  97.17%. 
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Table  3  Second  example  of  line  load 

priority  number  of  usage  service  comments 

level  connections  parameters 

extreme  1  (25  000)  CBR  5  virtual  leased  lines 

1  (150  000,  75  000,  0.006  667)  VBR  ? 

available  bandwidth  Ro  =  353, 107  cells/sec 
sum  of  PCR’s:  175,000  cells/sec  (178,107  not  used) 
sum  of  SCR’s:  100,000  cells/sec  (remains  253,107  cells/sec) 

high  1  (50  000)  CBR  10  virtual  leased  lines 

1  (103  107,  50  000,  0.01)  VBR  ? 

available  bandwidth  Ri  —  253, 107  cells/sec 
sum  of  PCR’s:  153,107  cells/sec  (100,000  not  used) 
sum  of  SCR’s:  100,000  cells/sec  (remains  153,107  cells/sec) 

medium  1  (121  107)  CBR  ? 

5  (5  000,1  000,0.1)  VBR  ? 

available  bandwidth  R2  —  153,107  cells/sec 
sum  of  PCR’s:  146,107  cells/sec  (7,000  not  used) 
sum  of  SCR’s:  126,107  cells/sec  (remains  27,000  cells/sec) 

low  1  (2  000)  CBR  test  channel 

5  (5  000,  3  000,  0.166  667)  VBR  ? 

available  bandwidth  R3  =  27,000  cells/sec 
sum  of  PCR’s:  27,000  cells/sec  (everything  used) 
sum  of  SCR’s:  17,000  cells/sec  (remains  10,000  cells/sec) 


5  SWITCH  ARCHITECTURE 

The  overall  architecture  of  the  switch  is  shown  in  Figure  3.  Its  components  are  input 
cards,  schedulers,  switching  fabrics,  and  output  cards.  Both  input  and  output  buffering 
are  used.  Inside  the  switching  fabrics  the  buffering  is  kept  minimal. 


5.1  Input  cards 

The  traffic  arriving  at  input  card  t  is  immediately  decomposed  according  to  priority  class 
a.  See  Figure  4.  Next  it  passes  a  shaper  which  limits  the  peak  cell  rate  of  the  class  as  a 
whole  to  R'a  which  equals  the  total  bandwidth  minus  the  nominal  cell  rates  allocated  for 
classes  of  priority  higher  than  a.  This  is  needed  while  1)  efficient  loading  of  a  transmission 
line  introduces  extra  bursts  in  the  low  priority  traffic;  2)  the  high  priority  traffic  has  to 
be  protected  against  bursts  of  low  priority  traffic.  Consequently,  no  shaper  is  provided  for 
the  traffic  of  highest  priority.  After  reshaping,  the  traffic  is  further  decomposed  accord¬ 
ing  to  destination  d  (i.e.  number  of  output  card)  and  stored  in  small  input  queues.  For 
convenience,  these  input  queues  are  labeled  (i,a,d). 
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Figure  3  Block  diagram  of  a  switch. 


5.2  Schedulers  and  switching  fabrics 

Schedulers  control  the  dispatch  of  cells  from  input  buffers  to  the  switching  fabric.  There 
is  one  scheduler  for  each  priority  class  a  and  each  destination  d.  It  is  labeled  (a,d)  and 
monitors  all  input  queues  ( i,a,d ),  i  —  Each  clock  cycle  atmost  one  cell  with 

priority  a  and  destination  d  is  given  permit  to  enter  the  switching  fabric.  In  this  way  the 
buffering  inside  the  switching  fabric  is  kept  minimal.  The  schedulers  use  a  round-robin 
algorithm  to  select  the  input  queue  which  obtains  permit  to  transfer  a  cell  to  the  switching 
fabric.  A  weighted  queueing  algorithm  would  yield  slightly  better  performance,  but  was 
discarded  because  of  the  more  complex  implementation. 

There  is  one  switching  fabric  for  each  priority  class.  It  has  a  constant  delay  and  is 
non-blocking.  There  are  multiple  paths  (e.g.  2)  between  each  input  queue  (i,  a,  d)  and 
the  corresponding  switching  fabric  of  priority  a.  One  way  of  implementing  the  switching 
fabric  could  be  by  means  of  a  X  d  busses.  Then  the  schedulers  contain  nothing  more  than 
bus  arbitration  logic  to  prevent  that  several  cells  are  placed  on  the  bus  simultaneously. 

5.3  Output  cards 

The  cells  leaving  the  switching  fabric  are  fed  through  a  shaper  which  limits  the  peak  cell 
rate  of  the  class  as  a  whole  to  Rda  which  is  the  total  bandwidth  of  the  output  minus  the 


352 


Part  Seven  Bandwidth  and  Admission  Control 


input 

queues 


Figure  4  Input  Card. 


nominal  cell  rate  allocated  for  classes  of  higher  priority.  The  output  of  the  shaper  feeds 
the  output  queues. 

The  output  queues  of  different  priorities  are  cascaded.  See  Figure  5.  If  a  shaper  with 
priority  a  is  empty  (this  implies  that  no  cell  of  priority  a  is  transferred  from  the  switching 
fabric  to  the  output  card)  and  the  queue  of  lower  priority  a  +  1  is  not  empty  then  one 
cell  is  promoted  from  the  a+  1-queue  to  the  entry  of  priority  a.  Indeed,  an  empty  shaper 
means  that  the  traffic  decreases  below  the  allocated  peak  cell  rate.  Then  it  is  time  to 
insert  cells  of  lower  priority  into  this  traffic. 

The  cells  of  highest  priority  do  not  pass  through  a  shaper.  The  output  of  the  queue  of 
highest  priority  feeds  the  transmission  line.  If  this  queue  (of  length  one)  is  empty  then  a 
cell  is  taken  from  the  queue  next  in  priority. 


6  SIMULATION  RESULTS 

We  have  written  a  program  for  numerical  simulation  of  a  switch  with  architecture  as 
described  above.  Both  CBR  and  VBR  sources  are  simulated.  The  VBR  sources  are  of  a 
stochastic  nature  and  attain  seldom  their  nominal  peak  and  sustained  rates.  As  a  con¬ 
sequence,  in  all  our  simulations  the  observed  load  of  the  transmission  lines  is  somewhat 
lower  than  the  nominal  load.  We  have  started  by  simulating  a  single  multiplexer  in  order 
to  verify  that  the  principles  for  loading  transmission  lines  are  correct  and  of  practical  use. 
In  a  second  type  of  experiment  we  have  routed  the  output  of  two  heavily  loaded  multi¬ 
plexers  to  the  inputs  of  a  two  by  two  switch  forwarding  half  of  each  input  to  each  of  the 
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Figure  5  Output  Card. 


outputs.  In  this  way  we  could  study  the  effect  of  feeding  a  cell  stream  through  multiple 
subsequent  multiplexers/switches.  Technically,  only  one  switch  is  simulated.  Each  output 
of  the  switch  can  be  connected  to  any  input  of  the  same  switch  in  order  to  realise  more 
complex  configurations. 

Simulation  results  for  a  multiplexer  with  137  inputs,  loaded  as  described  in  Table  2, 
are  found  in  Table  4.  Simulation  for  32  sec.  takes  almost  4  hours  of  CPU  on  a  DEC 
AlphaServer  2100  4/275.  As  expected,  no  cells  are  lost  when  the  input  and  output  buffers 
are  dimensioned  as  predicted  by  theoretical  arguments  (see  below).  Hence  the  experi¬ 
ment  indicates  that  the  Cell  Loss  Ratio  (CLR)  is  below  10-7.  However,  from  theoretical 
considerations  we  expect  that  it  should  be  identical  zero.  The  average  load  of  the  trans¬ 
mission  line  turns  out  to  be  about  92.6%,  lower  than  the  nominal  load  of  99.0%,  because 
the  ON/OFF-sources  use  the  allocated  capacity  in  a  stochastic  manner,  not  always  at 
maximal  rate. 

The  predictions  quoted  in  Table  4  are  calculated  as  follows.  For  each  VBR  channel  n 
in  priority  class  a  estimate  the  number  of  cells  ca(n)  that  needs  to  buy  priority  from 
lower  class  trafic  by  the  tolerance  expressed  in  number  of  cells,  i.e.  by  Ts(n)rs(n).  Note 
that  ca(n )  =  0  for  CBR  connections.  The  sum  ca  =  ca(n)  is  the  estimated  maximal 
number  of  cells  that  has  to  buffered  at  one  stage  lower  priority.  According  to  formula  7 
the  quotient  ca/Ra+i  is  the  predicted  delay  jitter  for  traffic  of  lower  priority  a+  1.  Under 
the  assumption  that  the  main  delay  for  cells  of  priority  a  occurs  in  the  output  buffer  of 
priority  a  this  is  also  the  estimated  maximal  delay. 

In  a  second  type  of  experiment,  two  multiplexers  are  used  to  load  each  of  two  trans¬ 
mission  lines  to  over  90%.  Each  multiplexer  has  16  inputs  and  is  loaded  with  the  trffic 
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Table  4  Maximal  delays  observed  in  a  multiplexer 

priority  maximal  prediction 

class  delay  (sec)  (sec) 

extreme  0.000  018  0 

high  0.000  11  0.002  3 

medium  0.034  4  0.134 

low  0.942  2.274 


Figure  6  Configuration  of  the  switch. 


described  in  Table  3.  The  outputs  of  the  multiplexers  are  fed  into  a  two-by-two  switch.  The 
traffic  of  each  input  of  the  switch  is  about  equally  divided  over  each  of  the  outputs.  See 
Figure  6.  The  input  spacers  of  the  switch  are  now  essential  to  restore  the  characteristics 
of  the  incoming  traffic. 

The  configuration  has  been  simulated  for  50  sec.  The  maximal  delays  shown  in  Table 

Table  5  Maximal  delays  observed  in  the  switching  experiment 

priority  maximal  prediction 
class  delay  (sec)  (sec) 

extreme  0.000  038  0 

high  0.004  0  0.004  0 

medium  0.010  0  0.010  5 

low  0.034  0.047  5 
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5  correspond  with  the  total  time  between  source  and  sink.  This  example  shows  that  the 
predicted  delays  can  actually  be  reached.  By  monitoring  the  simulated  traffic  we  observe 
that  all  connections  within  the  same  service  class  suffer  from  the  same  delay  jitter.  This 
explains  why  reshaping  on  a  per  class  basis  suffices. 


7  EVALUATION 

7.1  Connection  Admission  Control 

How  to  decide  wether  a  new  connection  can  be  added  to  a  partially  loaded  transmission 
line?  In  the  first  place  the  availability  of  enough  bandwidth  has  to  be  checked.  In  the 
present  scheme  this  will  depend  on  the  desired  maximal  delay  and  hence  on  the  bandwidth 
still  available  in  the  suitable  priority  class.  For  a  new  connection  with  PCR  rp  and  SCR 
rs  the  criteria  (3)  and  (4)  become 

N 

rP  +  J2  rp(n)  ^ 

n=l 

N 

rs  +  T-p(O)  +  r»(n)  -  Rc, 

n=  1 

The  available  cell  rate  in  the  given  priority  class  is  denoted  Ra ,  a  =E,H,M,  or  L.  For  the 
highest  priority  class  Re  —  R,  for  other  classes  Ra  equals  R  minus  the  sum  of  sustained 
cell  rates  of  all  connections  with  higher  priority.  For  rp(0)  one  should  use  the  peak  cell 
rate  of  lower  priority  traffic.  In  practice,  rp(0)  can  be  taken  equal  to  the  Ra  of  the  priority 
class  of  one  lower  level.  It  can  be  necessary  to  adapt  the  values  of  Ra  to  make  it  possible 
for  (9)  to  be  satisfied.  In  addition,  limits  can  be  imposed  on  the  total  SCR  of  one  class  in 
order  to  guarantee  specified  maximal  delays  for  traffic  in  classes  of  lower  priority.  In  this 
way  the  bandwidth  allocation  involves  only  simple  arithmetics. 

Two  additional  properties  of  the  scheme  are: 

•  a  CBR  connection  can  always  be  moved  to  a  class  of  lower  priority  with  less  guarantees 
on  the  maximal  delays  (this  is  not  the  case  for  a  VBR  connection); 

•  both  PCR  rp  and  SCR  rs  can  always  be  reduced  (of  course  respecting  rp  >  rs);  in 
particular,  if  a  CBR  with  PCR  rp  can  be  admitted  then  a  VBR  with  the  same  PCR 
but  lower  SCR  can  also  be  admitted  (except  when  not  enough  queueing  memory  is 
available  —  see  below). 

In  addition  to  bandwidth  allocation  it  should  be  checked  that  there  is  enough  free 
memory  to  queue  the  cells  before  being  transmitted.  The  solution  proposed  here  is  to 
allocate  room  for  1  cell  per  connection  plus,  in  the  case  of  a  VBR  with  parameters  rp, 
rs,  and  rs,  an  additional  amount  of  rsrs  places  to  be  used  by  lower  priority  connections. 
In  this  way  the  acceptance  of  VBR  connections  does  not  hinder  the  further  allocation  of 
bandwidth  to  other  connections  of  possibly  lower  priority. 

Of  course,  CAC  involves  many  other  aspects  which  are  out  of  the  scope  of  the  present 
paper. 


(8) 

(9) 
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7.2  A  cost-effective  solution? 


Obviously,  the  proposed  multiplexing  scheme  is  only  meaningful  if  the  cost  of  memory  is 
smaller  than  the  cost  of  transmission.  More  precisely,  let  S  denote  the  cost  to  store  one  cell 
in  memory  during  a  time  equal  to  the  maximal  burst  tolerance  ts  of  a  given  priority  class. 
Let  T  denote  the  cost  for  transmitting  one  cell  over  the  transmission  line.  A  necessary 
condition  for  the  multiplexing  scheme  to  make  sense  is  that  S  is  an  order  of  magnitude 
smaller  than  T .  This  condition  seems  to  be  fulfilled  on  long  distance  connections. 

The  billing  of  a  CBR  service  should  be  proportional  with  the  cell  rate  r,  say  T  x  r 
units  per  second.  For  a  VBR  service  the  cost  of  the  actual  cell  transmission  is  then  Tra. 
Two  extra  contributions  have  to  be  taken  into  account:  the  excess  cell  rate  rv  —  rs  at  a 
fraction  A  of  the  cost  T  per  cell  and  the  cost  Srs  of  storage  of  low  priority  cells  in  a  buffer 
of  length  rsrs.  Hence  the  total  cost  per  second  for  the  VBR  service  is 


rsT  +  (rp  —  rs)XT  +  rsTsS, 
which  can  also  be  written  as 

rsT  (l  +  (—  -  0  A  +  ts^ 


(10) 


(11) 


The  VBR  service  should  be  cheaper  than  a  CBR  service  at  peak  rate  rp.  This  leads  to 
the  condition 


^>1  +  — - 
r,  +  1  -  AT’ 


(12) 


which  has  useful  solutions  if  S  «  T.  E.g.,  with  S  =  0.1  x  T  and  A  =  1/3  a  VBR  service 
with  rp  =  2.5  x  rs  costs  only  1.6  instead  of  2.5  times  more  than  a  CBR  service  with  the 
given  SCR  r$. 

Transmission  of  lower  priority  cells  costs  only  (1  —  A )T  because  part  of  the  transmission 
cost  (AT)  is  payed  by  corresponding  high  priority  cells  in  exchange  for  priority.  A  further 
reduction  in  price  can  be  considered  because  medium  and  low  priority  traffic  is  used  to 
fill  up  unused  bandwidth. 


7.3  Conclusions 


Statistical  multiplexing  of  VBR  sources  poses  the  non-trivial  problem  of  satisfying  simul¬ 
taneously  three  objectives:  efficient  use  of  bandwidth,  no  cell  losses,  and  limited  cell  delay 
jitter.  One  solution  to  this  problem  is  the  introduction  of  multiple  bearer  services  which 
differ  only  in  quality  of  service  guarantees.  The  present  paper  uses  theoretical  arguments 
and  numerical  simulations  to  show  that  a  multiplexer  or  switch  supporting  these  multiple 
bearer  services  can  indeed  achieve  the  three  objectives  quoted  above. 
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Abstract 

A  method  of  shaping  the  video  traffic  within  the  video  encoder  is  proposed.  At  the  intraframe 
coded  frames,  where  the  maximum  number  of  bits  are  generated,  the  coder  constrains  its 
generated  bit  rate  through  a  leaky  bucket  mechanism.  A  sliding  window  is  also  used  to 
maximise  network  utilisation  without  violating  any  of  the  traffic  parameters  declared  at  the  call 
set-up.  The  impact  of  the  shaping  mechanism  on  both  coding  and  network  performance  are 
studied.  It  is  shown  that  for  video  sequences  with  scene  cuts,  shaping  the  video  traffic  under  a 
certain  peak-to-mean  ratio  optimises  both  network  performance  and  perceived  image  quality. 

Keywords 

Traffic  and  Congestion  Control 


1  INTRODUCTION 

ITU-T  has  proposed  ATM  as  the  mechanism  for  multiplexing/switching  in  the  future  B-ISDN 
(ITU-T  1991).  A  key  challenge  in  the  ultimate  success  of  ATM  is  to  define  and  implement  a 
congestion  control  strategy  that  provides  an  efficient  sharing  of  network  resources  among 
different  services  with  diverse  traffic  characteristics.  Such  a  congestion  control  comprises  of 
three  sections,  namely  control  of  access  of  the  customers  to  network  resources,  policing  the 
traffic  flow  of  each  user  and  protection  of  the  Quality  of  Service  (QoS)  against  possible 
fluctuations  of  the  traffic  flow  above  the  channel  capacity  . 
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Video  services  are  expected  to  share  a  large  portion  of  the  traffic  handled  by  ATM  networks. 
A  critical  aspect  of  VBR  coding  and  transmission  is  the  real-time  constraints  for  VBR  video 
data.  The  network  has  to  ensure  the  on  time  delivery  of  data,  while  on  the  other  side  the  encoder 
has  to  provide  the  appropriate  shaping  functions  in  order  to  improve  the  channel  performance. 
This  shaping  function  can  be  used  by  the  encoder  to  regulate  its  traffic  at  the  ingress  of  the 
network. 

The  paper  is  structured  as  follows:  Part  2  describes  the  policing  functions/(UPC)  methods  for 
policing  a  service  in  an  ATM  network.  Part  3  presents  the  UPC  methods  proposed  for 
regulating  video  services.  Part  4  gives  the  impact  of  the  proposed  scheme  on  the  network 
pei formance.  Part  5  investigates  its  impact  on  the  perceived  image  quality.  Finally,  conclusions 
are  given  in  Part  6. 


2  POLICING  FUNCTION/USAGE  PARAMETER  CONTROL  (UPC) 

After  the  connection  is  established,  the  network  has  to  monitor  the  conformity  between  the 
declared  and  the  actual  cell  stream  parameters  at  the  ingress  of  the  network.  This  is  enforced  to 
protect  the  network  resources  from  possible  malicious  or  erroneous  users  who  may  exceed  the 
traffic  volume  declared  at  the  call  set-up  and  thus  overload  the  network.  This  function  called 
user  parameter  control  (UPC)  is  performed  at  the  user  network  interface/  network  network 
interface  (UNI/NNI)  for  each  existing  virtual  path/virtual  circuit  (VP/VC),  controlling  its  traffic 
flow  based  on  the  declared  traffic  parameters.  If  a  VP/VC  is  detected  violating  the  agreement,  its 
cells  can  either  be  discarded  or  tagged  for  later  discard  when  congestion  arises.  The  policing 
function  can  be  characterised  by  the  following  attributes  (BAE,  1991),  (IEEE,  1991),  (IEEE, 

I)  The  UPC  mechanism  should  be  selective  with  respect  to  the  policed  parameters.  It  should  be 
able  to  distinguish  the  trade  off  between  traffic  fluctuations  during  normal  operation  from  real 
traffic  violations. 

II)  It  should  respond  rapidly  to  parameter  violations. 

III)  The  mechanism  should  be  simple  and  flexible  to  implement. 

Some  of  the  most  common  policing  techniques  involve  leaky  bucket  and  window 
mechanisms.  Leaky  bucket  (Niestegge,  1990)  is  a  virtual  buffer  (bucket)  with  a  constant 
service  time,  as  is  illustrated  in  Figure  la.  Once  the  buffer  becomes  full  a  violation  is  detected. 
The  service  rate  of  the  virtual  buffer  corresponds  to  the  rate  to  be  policed  (for  example,  peak  bit 
rate  (PBR)  or  mean  bit  rate  (MBR))  assuring  that  UPC  algorithm  tolerates  fluctuations  caused 
by  cell  delay  variation  (CDV)  or  burstiness.  The  size  of  the  bucket  is  determined  by  the 
maximum  burst  length  that  the  user  is  allowed  to  submit  to  the  network.  Another 
implementation  of  the  leaky  bucket  is  to  control  the  traffic  flow  by  a  means  of  tokens  (Sidi, 
1989).  A  queuing  model  for  this  method  is  shown  in  Figure  lb.  An  arriving  cell  enters  the 
bucket  after  it  has  received  a  token  pool.  If  no  tokens  are  available,  a  cell  must  wait  in  the  queue 
until  a  new  token  is  generated.  Tokens  are  generated  at  a  fixed  rate  corresponding  to  the  bit  rate 
to  be  policed. 

A  window  is  a  fixed  time  interval,  defined  as  a  number  of  time  slots  in  an  ATM  VP,  which  is 
used  to  measure  the  number  of  cells  within  this  time  interval  (Bae,  1991),  (IEEE,  1991),  (IEEE, 
1992),  (Roberts,  1992).  There  are  two  versions  of  window  mechanisms,  namely  jumping 
window  and  moving/sliding  window,  as  shown  in  Figure  2.  The  jumping  window  consists  of 
non-overlapping  consecutive  time  intervals  that  counts  the  number  of  cells  delivered  from  a 
source  within  the  interval.  A  new  interval  starts  immediately  at  the  end  of  the  preceding  interval 
where  the  associated  counter  is  reset  to  zero.  In  the  moving  window,  the  window  slides 
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continuously  through  the  time.  Thus,  the  arrival  time  of  each  cell  is  stored  and  a  counter  is 
incremented  by  one  for  each  new  arrival.  Exactly  T  time  units  after  an  arrival  of  an  accepted  cell, 
the  counter  is  decreased  by  one. 


Arrivals 


Queue 


Departing  Cells 


Figure  1:  Schematic  representation  of  leaky  lucket. 


Jumping  Window 


Jb  u 


Moving  Window 


Figure  2:  Schematic  representation  of  window  mechanisms. 


3  EMPLOYMENT  OF  POLICING  MECHANISMS  IN  VIDEO  SERVICES 

Much  attention  has  been  paid  to  the  implementation  of  the  policing  functions  in  packet  video. 
Such  attention  stems  from  the  fact  that  real  time  services,  such  as  video,  prefer  to  constrain  the 
generated  traffic  according  to  the  declared  MBR  and  PBR  such  that  no  penalty  is  imposed  from 
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the  policing  function.  Such  penalty  may  lead  to  deterioration  of  the  QoS  due  to  the  discard  of 
cells  which  are  important  for  the  reconstruction  of  the  pictures  at  the  decoder.  The  policing 
function  is  normally  imposed  at  the  UNI.  Once  a  violation  from  a  source  is  detected,  cells  from 
that  source  may  be  discarded  or  tagged  for  later  discard.  As  an  option  a  feedback  from  the  UNI 
to  the  source  can  be  set  up  to  regulate  the  traffic  as  required. 

Ratheb  (1993)  has  studied  the  impact  of  the  policing  functions  on  video  services.  His  studies 
show  that  efficient  policing  can  be  achieved  by  restricting  the  PBR  to  a  reasonable  value. 
Among  the  leaky  bucket,  moving  and  jumping  windows,  the  leaky  bucket  exhibits  better 
performance  than  the  window  mechanisms.  Policing  of  the  MBR  seems  not  to  be  realistic  for 
either  of  the  mechanisms  due  to  the  large  bucket/window  requirements.  This  is  because 
observations  have  shown  that  cell  losses  occur  in  clusters  and  for  a  given  policed  rate,  the  size 
of  the  bucketAvindow  has  to  be  extremely  large. 

The  imposition  of  UPC  can  badly  damage  QoS  of  those  services  violating  their  traffic 
descriptors.  This  is  more  pronounced  in  an  interframe  coded  video,  where  loss  of  one  cell  may 
propagate  through  several  video  frames.  It  would  be  beneficial  to  the  user  himself  to  control  his 
generated  bit  rate  prior  to  being  penalised  by  the  network  operator.  This  is  because  if  the 
encoder  codes  pictures  at  a  lower  bit  rate  and  image  quality  is  temporarily  degraded,  since  the 
decoder  without  cell  loss  can  track  the  encoder,  the  picture  quality  can  be  improved  later.  On 
the  other  hand  if  cells  are  lost  due  to  the  network  policing,  since  the  decoder  can  not  track  the 
encoder,  the  picture  quality  will  remain  poor  for  a  long  time,  which  is  very  objectionable. 

Harasaki  and  Yano  (1993)  have  used  a  leaky  bucket  to  police  both  PBR  and  MBR.  They  have 
demonstrated  that  the  leaky  bucket  size  should  be  quite  long  (possibly  as  long  as  several 
seconds)  but  not  prohibitively  long  from  network  designers'  point  of  view,  in  order  to  allow 
constant  picture  quality  for  most  of  the  time  during  a  long  video  program. 

Kawashima  and  Tominaga  (1993)  have  used  a  sliding  window  to  police  the  MBR.  This 
method  utilises  variability  of  bit  rate  under  the  constraints  by  the  UPC  and  its  influence  on  the 
QoS.  It  has  been  reported  that  transmission  of  video  under  this  mechanism  shows  significantly 
better  image  quality  than  the  conventional  constant  bit  rate  (CBR)  transmission  in  scene 
changes.  In  addition,  when  the  sliding  window  is  small  (10  to  30  frames)  image  quality  is  very 
poor  in  the  areas  of  pictures  with  zooming  or  panning.  We  have  adopted  a  more  integrated 
approach  where  a  shaping  mechanism  is  used  to  police  the  declared  traffic  parameters  and  adjust 
the  actual  bit  stream  accordingly,  such  that  both  network  and  codec  performance  is  optimised. 


4  IMPLEMENTATION  OF  THE  SHAPING  MECHANISM 

The  proposed  shaping  mechanism  imposes  two  constraints  in  the  generated  bit  rate.  The  first 
constraint  deals  with  the  shaping  of  PBR  and  the  second  is  concerned  with  the  control  of  MBR. 
Thus,  the  objective  of  the  shaping  mechanism  is  the  best  usage  of  the  available  resources 
provided  by  the  network  operator,  while  at  the  same  time  the  user  tries  not  to  violate  the  contract 
declared  at  the  call  set-up.  These  constraints  are  described  in  the  following  sub-sections. 

4.1  Shaping/Smoothing  of  the  PBR 

Video  codecs  for  ATM  networks  are  VBR  oriented.  The  bit  rate  variation  is  a  function  of  scene 
content  and  motion  of  moving  objects.  The  PBR  (the  maximum  number  of  bits  in  one  frame 
period)  is  normally  generated  at  scene  cuts,  where  the  pixels  are  coded  with  an  intraframe 
method.  In  this  study,  an  H.261  standard  video  codec  was  used,  where  images  are  interframe 
coded  using  motion  compensation  for  greater  compression.  At  scene  changes,  the  encoder 
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switches  to  an  intraframe  mode,  generating  its  PBR.  The  bit  rate  can  be  regulated  by  adjusting 
the  quantiser  step  size.  For  example  in  the  reference  model  simulation  coder  (RM8,  1989),  the 
quantiser  step  size  can  be  changed  at  the  start  of  each  group  of  pictures  (GOB)  or  at  one  third  of 
them  (11  macroblocks)  (ITU,  1990).  This  technique  in  conjunction  with  the  rate  smoothing 
buffer  is  employed  in  circuit  switched  network  applications  to  deliver  constant  bit  rate  video  into 
the  channel. 

The  proposed  strategy  for  the  shaping  of  the  PBR  is  to  employ  two  virtual  buffers.  The  first 
buffer  performs  like  a  leaky  bucket  and  its  occupancy  is  used  as  a  feedback  to  the  encoder  to 
control  the  quantiser  step  size.  Note  that  increase  (decrease)  in  the  quantiser  step  size  results  in 
the  decrease  (increase)  in  the  generated  bit  rate.  The  second  buffer  counts  the  total  number  of 
bits  generated  within  the  scene  cut  frame.  A  threshold  value,  s,  is  imposed  at  the  counter  to 
control  the  quantiser  step  size  further  whenever  is  necessary.  The  dimensions  of  both  buffers 
are  equal  to  the  PBR  declared  at  the  call  set-up. 

The  method  employed  in  RM8,  was  used  to  detect  a  scene  cut  by  comparing  the  variances  of 
intraframe  and  interframe  coded  macroblocks.  Then,  if  in  the  first  few  GOBs  (e.g.  2-3  GOBs) 
the  majority  of  the  macroblocks  are  intraframe  coded,  it  can  be  assumed  that  the  whole  frame 

will  be  intraframe  coded,  i.e.  detection  of  a  scene  cut.  This  introduces  f  — ,—  ]  of  a  frame  delay, 

\6  4  J 

corresponding  to  almost  6-9  ms  in  a  30  Hz  video.  At  scene  cuts,  where  the  codec  switches  to 
the  intraframe  mode,  the  quantiser  step  size  is  adjusted  at  the  start  of  the  frame.  Since  the  aim  of 
the  peak  constraint  is  the  reduction  of  the  PBR  which  occurs  at  scene  cuts,  then  the  quantiser 
step  size  has  to  be  increased.  It  was  found  experimentally  that  a  good  starting  point  is  to  set  the 
quantiser  step  size  qp  to  1.5  xqint , where  qint  is  the  quantiser  step  size  during  the  interframe 
coding  mode.  In  our  experiments,  qim  was  set  to  12.  While  coding  the  scene  cut  frames,  the 
quantiser  step  size  is  adjusted  every  1 1  macroblocks  based  on  the  fullness  of  the  leaky  bucket. 
The  adjustment  of  the  quantiser  step  size  is  controlled,  based  on  RM8  where  the  quantiser  step 
size  qsc,  is  : 


qSc  =  2xINT 


f32xb^ 
V  ^max  J 


+  2 


(1) 


where  bi  denotes  the  leaky  bucket  fullness  after  coding  each  macroblock  and  bma.x  is  the 
control  buffer  dimension  determined  by  the  targeted  PBR.  The  initial  leaky  bucket  content  is 
calculated  from  (1),  such  that  quantiser  step  size  is  qint .  The  leaky  bucket  is  filled  up  with  the 

PBR 

rate  of  generated  data  at  each  macroblock,  but  it  is  emptied  at  rate  at  every  macroblock 

(there  are  396  macroblocks  in  each  frame).  Once  the  buffer  content  reaches  the  threshold  s,  the 
quantiser  step  size  is  further  adjusted  by  : 


<lsc 


=  <7pX^vX 


1  coded  MB 
_ total  MB _ 

bmax  -  leaky  bu  cket  level 


(2) 


v  y 

where  bav  is  the  target  mean  bits/frame.  It  was  found,  that  a  threshold  of  s  =  0.7  xb^  is  a 
good  indication  of  the  fullness  of  the  virtual  buffer  that  controls  the  generated  bit  rate  in  the 
scene  cut. 
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4.2  Control  of  the  MBR 

The  shaping  of  the  PBR  itself  is  insufficient  to  yield  a  reliable  control  mechanism  since  the  other 
important  parameter  declared  during  the  call  set-up  is  the  MBR.  The  encoder  should  employ  a 
method  such  that  MBR  is  neither  underestimated  nor  overestimated.  Underestimation  would 
lead  to  cell  loss  while  overestimation  would  be  poor  utilisation  of  the  network  resources  that  the 
user  has  paid  for.  For  this  purpose,  a  sliding  window  may  be  employed  to  monitor  the  short 
term  MBR  which  is  used  as  a  guideline  to  estimate  the  long  term  MBR.  For  the  target  MBR  of 
bav  bits/frame,  the  expected  bit  rate  within  the  window  of  size  w  frames  is  w  x  buv.  The  actual 
generated  bit  rate,  wsum,  within  this  interval  is: 


w  =  'V  /  (3) 

sum  /  i  J  i 
i=l 

where  fi  is  the  generated  bit  rate  at  frame  i.  Thus,  at  any  time  instant,  the  deviation,  dev,  of  the 
actual  sum  from  the  expected  one  within  the  window  is: 

dev  =  {wxbav)-wsum  (4) 


which  is  used  to  code  the  new  frame. 

To  code  a  new  frame,  the  window  is  shifted  by  one  frame.  The  frame  which  is  dropped  out  of 
the  window  with  bit  rate/rem,  is  added  to  the  deviation  bit  rate  to  estimate  the  allowable  bit  rate 
for  coding  the  new  frame  fnew  as: 


f new  d'clev  f  rt 


(5) 


The  quantiser  step  change  A q  for  the  new  frame  in  the  window  is  calculated  by  normalising 
fnew  1°  the  bav: 


Aq  =  — — ^ 

K 

Thus  the  quantiser  step  size  for  the  new  frame,  q,  is  derived  from: 


(6) 


9  =  <7m,n+A4  (/) 

To  preserve  the  characteristics  of  VBR  coding,  the  upper  bound  of  the  quantiser  is  crucial.  If 
the  variation  in  the  quantiser  step  size  is  quite  large,  it  may  degrade  the  picture  quality.  In 
addition,  the  overall  consistency  of  picture  quality  is  affected.  On  the  other  hand,  small  variation 
in  the  quantiser  step  size  may  cause  wsum  to  exceed  w  x  buv.  Thus,  in  order  to  compromise  the 

above  effect,  Aq  is  limited  to  a  maximum  of  four  while  the  lower  bound  of  q  is  set  to  qmin.  The 
newly  adjusted  quantiser  step  size  is  used  to  code  the  next  frame.  No  further  transition  in  the 
quantiser  step  size  is  allowed  within  the  next  frame  to  obtain  consistency  within  the  frame. 
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5  THE  IMPACT  OF  TRAFFIC  SHAPING  ON  THE  DECODED  IMAGE 
QUALITY 


A  typical  video  sequence  containing  several  scene  cuts  was  used  to  evaluate  the  effect  of  the 
traffic  shaping  on  the  decoded  image  quality.  A  fixed  quantisation  step  size  of  12  was  used  to 
code  the  sequence.  Figures  3  and  4  illustrate  the  cell  generation  and  the  PSNR  profiles 
respectively  for  the  video  sequence  under  study. 
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Figure  3:  Bit  Rate  profile  of  a  typical  video  trace  with  several  scene  cuts  coded  with  an  H.261 
video  codec. 
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Figure  4:  PSNR  of  a  typical  video  trace  with  several  scene  cuts  coded  with  an  H.261  video 
codec. 
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At  a  fixed  quantiser  step  size  (q=  12)  the  picture  quality  is  almost  constant.  Small  quality 
variation  is  due  to  the  scene  dependency  of  coded  video.  The  sequence  was  also  coded  under 
the  shaping  constraints.  The  impact  of  the  shaping  constraints  on  the  bit  rate  and  PSNR  is 
demonstrated  below. 

5.1  Peak  To  Mean  (P/M)  ratio 

For  a  given  MBR,  the  constraint  imposed  on  the  PBR  reduces  P/M.  Figure  5  illustrates  the  cell 
generation  profile  of  the  video  trace  when  P/M  is  reduced  from  its  unconstrained  value  (3.5)  to 
2.5.  Due  to  the  PBR  constraint,  PSNR  is  degraded  at  the  scene  cuts,  as  illustrated  in  Figure  6. 


Figure  5:  Bit  Rate  profile  of  a  typical  video  sequence  under  the  shaping  mechanism. 
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Figure  6:  PSNR  of  a  typical  video  sequence  under  the  shaping  mechanism. 
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The  allowed  degradation  is  picture  dependent  and  is  subject  to  the  visibility  threshold  of  the 
observer.  The  drop  in  the  bit  rate  at  scene  cuts,  causes  the  bit  rate  in  the  subsequent  frames  to 
sustain  in  a  high  level  until  the  MBR  converges  to  its  long  term  average.  The  smaller  the  P/M 
(larger  constraint  imposed  in  the  PBR),  the  worse  is  the  degradation  in  the  PSNR  at  scene  cuts. 
Since  in  normal  TV  programmes  the  scene  cut  frequency  is  small  (1  every  5-9  s  (Hughes, 
1993)),  the  constraint  imposed  on  PBR  does  not  alter  the  MBR  significantly. 

5.2  Window  Size 

The  window  size  determines  the  number  of  frames  used  to  calculate  the  short  term  MBR.  It  has 
been  suggested  (Kawashima,  1993)  that  a  selection  of  window  in  the  range  of  50  to  150  video 
frames  would  give  good  estimation  of  the  MBR  while  at  the  same  time  image  quality  does  not 
degrade  in  scenes  of  panning  or  zooming.  The  results  have  shown  that  by  using  different 
window  sizes  for  a  given  PBR  and  MBR,  variation  in  the  overall  PSNR  is  not  significant,  as 
Figure  7  illustrates. 


Figure  7:  Effect  of  window  size  on  the  PSNR  of  a  typical  video  under  the  shaping 
mechanism. 

No  constraints  are  imposed  on  the  bit  rate  up  to  the  point  where  the  window  is  full  assuming 
that  there  are  no  scene  cut  frames  in  this  period.  When  the  coder  starts  controlling  the  bit  rate, 
the  MBR  for  the  overall  traffic  tends  to  converge  towards  the  long  term  mean,  as  illustrated  in 
Figure  8.  In  addition,  once  the  user  selects  an  appropriate  window  size  in  the  range  of  50-150 
frames,  the  window  size  has  no  impact  on  the  generated  bit  rate  as  illustrated  in  Figure  9. 
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Figure  8:  Effect  of  the  window  size  on  the  MBR  of  a  typical  video  under  the  shaping 
mechanism. 
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Figure  9:  Effect  of  the  window  size  on  the  bit  rate  of  a  typical  video  under  the  shaping 
mechanism. 

6  THE  IMPACT  OF  TRAFFIC  SHAPING  ON  THE  NETWORK 
PERFORMANCE 

Although  the  reduction  of  the  peak  bit  rate  at  the  intraframe  coded  frames,  leads  to  poorer  PSNR 
at  these  frames,  it  is  expected  that  it  will  ease  network  congestion  reducing  the  cell  loss  rate.  To 
study  this  improvement,  a  single  multiplex  of  eight  homogeneous  video  sources  was 
considered.  An  8-cell  size  buffer  was  used  at  the  input  of  the  multiplex  to  withstand 
simultaneous  cell  arrivals  from  the  eight  sources  (one  cell  per  channel).  A  FIFO  policy  was 
employed  to  serve  the  buffer. 
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The  output  of  the  encoder  generates  bits  per  macroblock.  Every  44  bytes  of  video  data  were 
packetised  into  the  payload  of  ATM  cells  and  a  list  of  interarrivals  of  video  cells  was  generated. 
Each  sequence  was  considered  as  a  circular  linked  list  of  homogeneous  video  sources.  An 
event  driven  simulation  was  adopted  for  the  generation  of  the  traffic  of  each  video  source.  For 
each  video  source  a  different  offset  point  (randomly  selected)  in  the  list  was  used  to  ensure  that 
all  sources  are  not  identical  on  a  cell  by  cell  basis.  The  distances  between  the  starting  points 
were  taken  larger  than  10  frames  such  that  correlation  between  cell  generation  was  made  small. 

It  was  observed  that  reducing  the  PBR  or  (P/M)  decreases  the  cell  loss  rate.  This  is  because 
lowering  P/M  reduces  the  burstiness  of  the  incoming  data  at  the  intraframe  coded  frames  and  the 
small  multiplexing  buffer  is  less  flooded.  However  at  much  lower  values  of  P/M,  the  cell  loss 
rate  rises  again.  This  is  due  to  the  fact  that  although  image  quality  at  scene  cuts  is  impaired,  the 
interframe  errors  (due  to  coding  distortions)  in  the  subsequent  frames  remain  high  for  a  few 
frames,  till  all  the  coding  errors  are  cleared.  Thus,  the  limited  multiplexed  buffer  is  subject  to  a 
flow  of  data  for  a  longer  time.  Therefore,  there  should  be  an  optimum  value  for  P/M,  where  the 
cell  loss  rate  is  the  smallest.  Figure  10  illustrates  the  cell  loss  rate  for  various  network  loadings 
when  P/M  is  reduced  from  its  unconstrained  value  of  3.5  to  2.0  .  The  smaller  the  loading  factor, 
the  larger  becomes  the  difference  in  cell  loss  ratios  for  different  P/M  ratios.  For  example  at  50% 
network  load,  the  optimum  P/M  ratio  for  this  sequence  is  2.25.  For  such  P/M  ratio,  the  cell  loss 
is  less  than  1/6  of  the  unconstrained  video.  As  network  load  increases,  more  cells  face  the  full 
buffer  and  thus  the  difference  in  cell  loss  ratios  decreases. 

The  optimum  value  of  P/M  shown  in  Figure  10  can  also  be  justified  from  an  investigation  of 
the  burstiness  of  the  generated  data  under  various  P/M  ratios.  Here  we  define  burstiness  as  the 
number  of  generated  cells  per  macroblock.  Figure  1 1  illustrates  the  mean  values  of  burstiness 
for  various  P/M  ratios,  showing  that  P/M  of  2.25  has  the  least  burstiness. 


Figure  10:  Cell  loss  rate  for  different  P/M  ratios  at  various  network  loads. 
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Figure  11:  Mean  burstiness  of  the  generated  video  at  different  P/M  ratios. 


7  THE  IMPACT  OF  TRAFFIC  SHAPING  ON  THE  QoS  OF  VIDEO 
SERVICES 

The  video  sequence  with  the  characteristics  of  Figure  10  was  used  to  evaluate  the  PSNR  of  the 
coded  pictures  under  the  uncontrolled  and  the  optimum  constrained  P/M  ratios.  As  Figure  10 
shows,  for  the  uncontrolled  P/M  value  of  3.5  and  multiplex  buffer  size  of  eight  cells,  the  cell 
loss  rate  at  network  load  of  0.5  ,  is  almost  4  x  10'~l .  This  value  for  the  optimum  constrained 

P/M  value  of  2.25  is  nearly  6xl0~5,  which  is  about  6  times  smaller  than  that  of  the 
uncontrolled  case.  Assuming  that  cell  loss  occurs  in  clusters  and  are  confined  within  a  frame, 
then  for  the  750  frames  sequence  under  study  the  cell  loss  rates  in  a  particular  frame  are  almost 
2.7  x  10”'  and  4.5  x  10“2  for  uncontrolled  and  optimum  constrained  P/M  ratios  respectively. 
Two  cases  were  examined:  cell  loss  at  a  scene  cut  (intraframe  coded)  frame  and  cell  loss  at  a 
interframe  coded  frame 

7.1  Cell  Loss  in  a  Scene  Cut  Frame 

Figure  12a  illustrates  a  scene  cut  picture  frame  of  the  sequence  under  the  uncontrolled  P/M  ratio. 
The  scene  cut  frame  of  the  sequence  was  exposed  to  27%  cell  loss.  Although  picture  quality  in 
the  non-lossy  areas  is  good,  the  artefacts  due  to  cell  loss  are  very  disturbing.  Due  to  the 
interframe  nature  of  the  codec  and  the  fact  that  the  encoder  is  unaware  of  the  cell  loss,  in  the 
decoded  images  the  artefacts  will  propagate  through  the  image  sequences  and  can  last  for  a  long 
time,  as  shown  in  Figures  13a  and  14a,  which  display  the  picture  at  one  frame  and  five  frames 
respectively  after  the  lossy  scene  cut  frame.  These  artefacts  can  be  cleared  when  the  entire  frame 
is  updated  with  intraframe  coded  information  (Ghanbari,  1993).  Figure  15  illustrates  the 
propagation  effects  of  cell  loss  for  the  sequence  under  study.  At  the  instant  of  the  cell  loss  the 
image  quality  drops  from  its  nominal  value  of  36  dB  to  26  dB.  It  may  take  several  frames  for 
the  decoder  to  completely  recover  from  the  lost  cells  of  one  frame.  For  example  in  the  H.261 
codec, 
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Figure  12  A  scene  cut  frame  with  a  cell  loss  rate  of  a)  27%  unconstrained  and  b)  4.5% 
optimum  constrained. 
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Figure  13  Cell  loss  at  one  frame  after  the  lossy  scene  cut  frame  ,  a)  unconstrained  and  b) 
optimum  constrained. 
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Figure  14  Cell  loss  at  five  frames  after  the  lossy  scene  cut  frame  ,  a)  unconstrained  and  b) 
optimum  constrained. 
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since  at  least  3  macroblocks  in  a  frame  are  intraframe  coded,  it  may  take  132  frames  (nearly  5-6 
s)  till  the  effect  of  cell  loss  can  disappear. 

The  same  scene  cut  frame  under  the  constrained  P/M  of  2.25  ratio  was  exposed  to  4.5%  cell 
loss.  Figure  12b  shows  the  quality  of  the  image  at  the  scene  cut,  where  the  cell  loss  occurred 
first.  It  is  not  surprising  that  due  to  smaller  cell  loss  rate,  the  picture  quality  is  better  than  that  of 
Figure  12a.  In  Figure  12b  apart  from  the  cell  loss  artefacts,  picture  quality  in  non-lossy  area, 
due  to  the  constraint  on  the  PBR,  is  poor.  However,  impairments  due  to  the  bit  rate  constraint 
(larger  quantiser  step  size)  do  not  appear  worse  than  the  cell  loss  artefacts.  At  one  frame  after 
the  scene  cut,  the  quantiser  step  size  is  set  back  to  its  nominal  value.  Since  the  decoder  is  aware 
of  this  change,  the  picture  quality,  which  was  impaired  due  to  the  bit  rate  constraint,  improves 
back  to  normal.  Figure  13b  and  14b  illustrate  the  pictures  at  one  frame  and  five  frames 
respectively  after  the  lossy  scene  cut  frames,  where  the  effect  of  the  bit  rate  constraint  distortion 
is  removed,  but  that  of  the  cell  loss  is  still  present.  These  pictures  in  the  non-lossy  areas  exhibit 
the  same  quality  of  the  uncontrolled  case  of  Figure  12a.  Considering  that  image  sequences  are 
displayed  at  rates  of  25-30  frames  per  second,  the  temporary  impairments  due  to  the  PBR 
constraint  is  hardly  noticeable,  but  that  of  cell  loss,  similar  to  the  uncontrolled  case  can  last  for  a 
long  time,  till  the  whole  frame  is  updated,  as  shown  in  Figure  15.  Since  the  cell  loss  in  this  case 
is  small,  the  PSNR  of  the  sequence  with  cell  loss  is  not  significantly  different  from  that  of 
without  cell  loss  at  scene  cuts.  However,  subjectively  the  cell  loss  artefacts  are  more  disturbing 
than  the  coding  distortions. 


Figure  15  PSNR  of  the  reconstructed  video  after  the  occurrence  of  cell  loss  at  a  scene  cut. 
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7.2  Cell  Loss  in  a  Non-scene  cut  Frame 

Similar  to  the  scene  cut  experiment,  it  was  assumed  the  lost  cells  are  only  confined  in  one 
interframe  coded  picture.  Figure  16  illustrates  the  PSNR  of  the  decoded  sequence  and  Figures 
17a  and  17b  show  the  image  quality  of  a  single  interframe  coded  picture  at  the  instant  of  cell 
loss  for  both  unconstrained  and  constrained  P/M  ratios.  Due  to  smaller  cell  loss  rate  under  the 
constrained  P/M,  the  picture  degradation  is  very  marginal.  The  artefacts  caused  by  the  cell  loss 
are  less  disturbing  than  those  of  the  scene  cut  frames  due  to  the  fact  that  lost  cells  do  not  carry 
significant  information.  Furthermore,  in  the  constrained  P/M  there  are  no  coding  impairments 
in  the  non-lossy  areas  since  no  constraint  is  imposed  on  coding  of  this  frame. 


Figure  16:  PSNR  of  the  reconstructed  video  after  the  occurrence  of  cell  loss  at  an  interframe 
coded  frame. 


8  CONCLUSIONS 

A  method  of  shaping  the  video  traffic  generated  by  an  H.261  type  VBR  video  codec  was 
proposed.  The  proposed  mechanism  incorporates  a  control  function  to  regulate  both  mean  and 
peak  bit  rates.  At  the  intraframe  coded  frames,  where  the  PBR  are  generated,  the  video  encoder 
limits  its  generated  bit  rate  through  a  leaky  bucket  mechanism.  The  bit  rates  are  controlled  by 
adjusting  the  quantiser  step  size  of  the  encoder.  The  adjustment  of  the  quantiser  step  size  is 
based  on  the  comparison  between  the  actual  generated  bit  rate  and  the  target  bit  rate  within  a 
specified  window  duration.  Decision  on  the  adjustment  is  made  at  the  start  of  each  video  frame. 
By  defining  MBR,  PBR  (P/M  ratio),  the  encoder  is  able  to  select  a  minimum  suitable  quantiser 
step  size  for  coding. 

It  was  demonstrated  that  there  is  an  optimum  value  of  PBR  for  a  given  MBR  (  optimum  P/M), 
where  both  network  and  perceived  image  quality  are  optimised.  This  PBR  is  less  than  the 
unconstrained  PBR  generated  by  video  codecs,  and  is  the  value  that  can  be  declared  by  the  user. 

From  the  coding  point  of  view,  the  performance  of  the  shaping  mechanism  under  optimum 
selection  of  its  parameters,  is  better  than  the  uncontrolled  method  both  subjectively  and  in  terms 
of  PSNR.  It  was  shown  that  for  a  typical  video  incorporating  scene  cuts  the  cell  loss  rate  can  be 
as  low  as  one  sixth  of  the  unconstrained  methods  at  low  link  utilisation. 
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Figure  17  A  non-scene  cut  frame  (interframe  coded)  with  cell  loss  rate  of  a)  27% 
unconstrained  and  b)  4.5%  optimum  constrained. 
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Abstract 

In  this  article  we  study  the  performance  of  a  real  ATM  switching  network  designed  for 
incorporation  into  existing  switching  systems.  Because  of  its  architecture  based  on  the  ATM 
Composite  technique,  this  new  network  will  give  access  to  Broad-Band  services  (images, 
data,  etc.)  while  still  remaining  compatible  with  the  constraints  inherent  in  Narrow-Band 
services  such  as  speech  or  high-quality  sound. 

The  focus  here  is  on  the  investigation  of  the  system  behaviour  to  determine  dimensioning  and 
call  acceptance  rules  yielding  a  high-performance  network  for  both  Narrow-Band  and 
Broad-Band  services. 

To  make  a  complete  study  of  the  performance  of  such  a  network,  three  calculations  are  carried 
out  in  turn: 

•  Blocking  of  64Kb/s  connections,  i.e.  the  probability  that  a  64Kb/s  call  will  be  rejected 
because  no  route  is  available  (shortage  of  composite  ATM  cells); 

•  Blocking  of  Broad-Band  services  connections,  i.e.  the  probability  that  a  VBR  or 
CBR-type  call  will  be  rejected  because  no  bandwidth  is  available; 

•  The  Cell  Delay  Variation  (CDV)  of  the  cells  carrying  the  services. 

The  main  contributions  of  this  paper  are  first,  the  application  of  overflow  traffic  theory,  which 
enables  us  to  give  an  exact  solution  of  the  number  of  cells  required  for  handling  64  Kbit/s 
services  with  the  ATM  Composite  technique,  and  second,  the  determination  of  a  very  accurate 
formula  for  the  calculation  of  the  blocking  probability  for  Broad-Band  services  which  yields 
quite  a  good  network  dimensioning  rule  and  efficient  call  acceptance  algorithms. 

Keywords 

ATM,  Broad-Band,  blocking,  call  acceptance,  cell  delay  variation,  CLOS  network,  composite 
technique,  Narrow-Band,  performance,  switching  network,  statistical  multiplexing. 

INTRODUCTION 

Evaluating  the  performance  and  determining  call  acceptance  procedures  for  ATM  switching 
networks  is  critical  for  the  choice  of  an  architecture.This  study  evaluates  an  ATM  switching 
network  using  the  composite  technique  for  narrowband  services  associated  with  statistical 
multiplexing  for  Broad-Band  services. 

The  architecture  of  the  network  being  studied  is  shown  in  the  diagram  below. 
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In  this  network,  64Kb/s  services  (Narrow- 
Band  services)  are  processed  using  the 
ATM  Composite  technique  (ref.l). 

The  principle  of  this  technique  is  to 
combine  time  slots  of  incoming  PCMs 
from  the  same  ATM  Composite  matrix 
(T/AC),  intended  for  the  same  outgoing 
matrix  (AC/T),  on  one  or  more  cells 
reserved  within  an  established  virtual 
circuit  for  that  direction. 

On  the  other  hand,  Broad-Band  services  - 
described  by  a  three-state  model  (Passive 
/On/Off)  -  are  multiplexed  statistically  on 
entrance  to  the  network  (MUX),  obeying  a 
rule  for  calls  acceptance  which  guarantees 
the  quality  of  the  service  at  the  cell  and  call 
levels.  In  the  same  way  as  for  64Kb/s 
services,  virtual  circuits  are  established  per 
call  within  the  core  of  the  network. 


Basically  the  core  of  the  ATM  network  has  a  CLOS  (ref.2)  structure,  which  ensures  that  for 
any  service  accessing  an  incoming  link  it  will  be  possible  to  establish  a  path  ,i.e.  a  virtual 
circuit  within  the  network.  At  this  level,  the  only  constraint  is  the  network  crossing  delay  for 
the  cells  depending  on  the  load  of  the  links. 

The  performance  of  such  a  network  is  therefore  described  by  the  blocking  probability  for 
64Kb/s  and  Broad-Band  calls,  and  by  the  crossing  delay  of  the  cells.  From  the  blocking  of 
calls  we  can  deduce  the  permissible  load  on  the  network's  internal  links.  This  load  is  then  used 
to  determine  the  crossing  delay. 


Figure  1 


1 .  Study  of  blocking  of  64Kb/s  connections 

1. 1  The  Narrow-Band  switching  matrix  and  the  A  TM  Composite 

Using  the  ATM  technique  to  transport  64Kb/s  services  carried  on  PCM  frames  requires 
basically  a  method  of  adapting  information  from  frame  format  to  ATM  format.  To  solve  this 
problem,  there  is  an  advantage  in  using  an  adaptation  layer  ("composite")  in  which  the 
payload  of  an  ATM  cell  is  made  up  of  time  slots  from  several  64Kb/s  channels. 

This  technique  involves  creating,  on  demand,  virtual  circuits  (cells  VC)  between  the  input  and 
output  matrices  (T/AC  and  AC/T)  connecting  the  PCMs  to  the  ATM  network.  The  time-slots 
of  PCMs  from  the  same  matrix  to  the  same  output  matrix  are  gathered  in  one  or  more  cells 
carried  by  virtual  paths  (VP)  ;  each  connection  being  fully  defined  by  its  unique  VPI  (Virtual 
Path  Identifier)  and  VCI  (Virtual  Channel  Identifier). 
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The  principle  is  as  follows: 

1)  hi  the  incoming  T/AC  matrix  (I),  a  connection  is  set  up  between: 

•  the  time  slot  of  a  64Kb/s  connection  carried  on  an  incoming  PCM  frame  (two  bytes  per 
time  slot) 

•  and  two  free  bytes  contained  in  the  information  field  (the  payload)  of  an  ATM  cell 
responsible  for  transporting  information  from  that  incoming  T/AC  matrix  (I)  to  the 
appropriate  outgoing  AC/T  matrix  (O). 

2)  In  the  ATM  switching  network  a  Virtual  Path  (VP)  connection  transports  the  various  cells 
between  matrices  (I)  and  (O). 

3)  In  the  outgoing  AC/T  matrix  (O),  a  connection  is  established  between: 

•  the  two  bytes  contained  in  the  payload  of  an  ATM  cell  received  by  AC/T  matrix  (O) 

•  and  the  time  slot  of  the  outgoing  64Kb/s  connection  carried  on  a  PCM  frame  sent  out  by 
AC/T  matrix  (O). 

These  unitary  connections  give  rise  to  a  search  for  free  space  in  one  or  more  set  up  between 
matrices  (I)  and  (O).  Twenty  three  (23)  spaces  are  available  per  VC,  because  among  the  48 
bytes  of  payload,  two  bytes  are  reserved  for  AAL1  functions.  If  there  is  not  enough  space,  a 
new  VC  can  be  set  up.  However  the  number  of  VCs  in  use  is  limited  by  the  capacity  of  the 
ATM  link;  for  example  on  a  622  Mbit/s  ATM  link,  only  183  cells  of  53  bytes  are  available 
every  125  ps. 

It  could  then  happen  that  no  space  would  be  available  in  the  cells  already  open  for  a  given 
destination  and  that  the  opening  of  a  new  cell  would  be  impossible  in  spite  of  free  places  for 
other  directions.  In  this  case,  the  system  will  not  be  able  to  accept  an  incoming  call  which 
will  then  be  blocked. 

1.2  Modeling 

Let  us  consider  one  of  the  D  number  of  T/AC  input  matrices.  From  the  combined  p  PCMs 
which  it  is  connected  to,  it  will  receive  traffic  at  intensity  A  generated  by  subscribers  and 
circuits  access  units.  As  only  a  negligible  amount  of  traffic  is  rejected  by  these  units,  which 
concentrate  the  traffic  of  a  large  number  of  sources,  the  traffic  offered  to  this  matrix  can  be 
taken  to  follow  a  Poisson  distribution. 

This  matrix  then  offers  its  traffic  to  the  D  AC/T  matrices  at  the  output  stage.  It  distributes 
them  with  equal  probability,  and  so  the  traffic  offered  in  any  given  direction  will  follow  a 
Poisson  distribution  of  intensity  A/D. 

Now,  let  us  take  the  case  of  a  network  with  D  directions  being  offered  a  traffic  A  such  as,  in 
average,  one  cell  with  23  connections  in  any  direction  will  suffice  to  carry  the  traffic  A/D. 
Fluctuations  in  the  traffic  offered  may  then  imply  that  supplementary  cells  could  be  needed  in 
some  of  the  D  output  directions.  For  a  given  direction,  the  effect  is  as  if  we  had  the  following 
system  of  service: 

Traffic  of  intensity  A/D  is  initially  offered  to  the 
first  cell  in  one  of  the  D  directions.  Any  call 
arriving  when  all  23  connections  in  this  cell  are 
occupied  will  be  offered  to  a  second  cell.  We 
then  have  a  traffic  overflow,  excess  traffic  from 
the  first  cell  being  offered  to  the  second.  This 
procedure  could  in  turn  lead  to  traffic  being 
rejected  by  the  second  cell  and  lost  completely, 
but  in  fact  in  our  study  the  totai  loss  probability  is 
so  small  as  to  be  taken  as  zero. 


A/D 


First  cel! 


Hi  121  1  1  1  ITT 

1  1  II  1  1  1  1  INI 
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1.2.1  Analysis  of  space  for  transition  probabilities: 

The  system  described  above  corresponds  exactly  to  an  overflow  system  consisting  of  two  sub¬ 
systems  which  we  shall  call  (N)  and  (S).  A  state  (n,s)  can  be  described  in  terms  of  the  number 
of  connections  n  occupied  in  (N)  where  n<N,  and  s  connections  occupied  in  (S)  where  s<S. 
The  behaviour  of  incoming  connections  (calls)  depends  on  the  state  of  (N)  and  (S).  The  calls 
can  be  said  to  be  directed  to  (N)  in  the  first  instance,  and  when  (N)  is  full  they  are  redirected 
to  (S).  This  means  that  the  state  of  (N)  is  independent  of  (S),  but  not  vice  versa. 

This  type  of  system  has  been  studied  by  many  authors.  In  the  next  sections  we  will  follow 
Brockmeyer's  analysis  (ref.3). 

The  behaviour  of  (N)  is  a  birth 
and  death  process  within  a 
number  of  states  limited  to  N; 
that  of  (S)  is  a  pure  death 
process  so  long  as  n<N, 
because  (S)  is  not  then  fed 
with  calls  -  in  fact,  so  long  as 
n<N,  (S)  only  finds  its  calls 
coming  to  an  end.  The  only 
time  that  (S)  receives  calls  is 
when  (N)  has  all  its  23 
connections  engaged.  The 
process  then  becomes  a  birth 
and  death  one. 

Figure  3 

The  state  graph  below  (Figure  3)  describes  the  space  of  probabilities.  From  it  we  can  derive 
the  "equations  of  future"  and  the  system's  state  equations. 

We  note  Psn  the  probability  of  state  (n,s). 

The  "equations  of  future"  derived  from  the  graph  are: 

P0\t  +  At)  =  P°(t).  /j\At  +  P0'  (t)/u\At  +  /’oVXl  -  AAt) 

Ps"(t  +  At)  =  p;-'  (t).  AAt  +  C  (t)/At  +  p;+ '  (t)pn+[At  +  p;  (f)[l  -  (AL+  ftn  + 

PSN  (t  +  At)  =  PSN-'  (t).  AAt  +  P”,  (t)AAt  +  P”,  (t)//s+lAt  +  PSN  (0[l  -(A+UN  +  /zv)A/] 

P"  (t  +  At)  =  Psf  (t).  AAt  +  Pf-'  (t)AAt  +  PSN  (/)[l  +  /iS)A/] 

and  the  state  equations  corresponding  to  statistical  balance  are  then: 

ka/d-p:-k= o 

p:(A/D+n+s)-p:~'  a/d- p"'  (« + 1)  ■ -  c  (5 + 1) = o 
pf(A/D+N+S)-psN-'  yD-p£  A/D-ps:t(s+ d=o 

pf(N+s)-pf-'  yD-ps\  a/d=o 


(2) 
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1.2.2  Solution  of  the  set  of  equations 


To  solve  the  above  set  of  equations,  Brockmeyer  introduces  the  polynomial  defined  by: 


s;  ( %)  =  Z  J  r,+V )  and  s:  =  0  if  m<0  or  r<0 
'  u  v=0  (m  -  v) ! 


(3) 


The  solution  is  then  written  thus: 


Where  : 
and  : 


T'x 

J  +  X 


P,J=E(-l)XK,+x('+])sf 

x=0 

Kk  =  Z(-Dr-k(::|)ar  and 


r=k 


]Z _ 

^0  oN  +  S 


1  S 

_ _ 'V'/ v— 1  N  +  v 

SqN  L~i\  r-lPo 


qN+Sq 
^1  v=r 


df) 


N 

Tire  distribution  of  overflow  given  by  :  Q(i)  =  Z  P/ 

r= o 

becomes  :  Q(i)  =  Ki+X(i+ (5) 

x=0 


And  so,  in  our  application,  the  probability  P  that  more  than  one  cell  will  be  used  is: 


P  =  t,Q(i) 


^=ZZ(-ir^b)s, 


N-x 

i+l+x 


(6) 


where  S=N=23 


The  total  number  of  cells  required  in  all  D  directions  can  then  be  easily  obtained.  Since  traffic 
is  offered  independently  to  each  of  the  D  directions,  the  distribution  of  the  number  of  cells 
required  is  given  by  the  binomial  law  : 

P(NC)  =  CkDPk(\-  P)(D~k) 
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where  P(NC)  is  the  probability  that  exactly  Nc=2k+(D-k)  cells  will  be  engaged  and  P,  the 
probability  given  by  (6).  It  will  be  said  that  there  is  call  blocking  whenever  Nc  is  greater  than  a 
given  value  Nmax  (Nmax  =  1 83  in  our  study  which  is  the  maximum  number  of  cells  available 
every  125  ps  on  a  622  Mbit/s  ATM  link). 


The  average  number  of  cells  engaged  is  :  Nc  =  [(1-  P)+  P  *2]D 


The  occupancy  rate  (p)  of  the  ATM  links  will  be  derived  from  this  number  ;  i.e.  :  the 
probability  of  a  cell  to  be  engaged  is  : 


Generalisation 


The  result  above  can  be  applied  to  cases  where  x  cells  are  systematically  used  in  each 
direction  (x  =[a/23d\  );  the  value  of  N  is  then  set  at  23.x  and  the  value  of  S  remains  at  23. 
In  this  case  it  is  assumed  that  the  probability  of  using  fewer  than  x  cells  or  more  than  x+I 
cells  in  each  direction  is  negligible. 


The  distribution  of  the  total  number  of  cells  required  in  all  D  directions  is  then  : 

P(NC)  =  CkDPk(l-P)lD-k)  (71 

where  P(NC)  is  the  probability  of  having  exactly  Nc=k(x+1  )+(D-k)x  cells  engaged,  and  the 
average  number  of  cells  engaged  is  : 

~NC  =  [(1  -  P)x  +  P(x  +  1)]  D  (?) 


1.3  Application 


Being  given  a  loss  probability,  a  study  of  the  performance  of  a  switching  network  would 
consist  in  the  determination  of  the  traffic  load  to  be  offered  and  in  the  calculation  of  the  mean 
number  of  used  cells  (which  will  be  involved  in  CDV  calculation). 

We  will  therefore  apply  the  results  derived  above  and  compare  the  numerical  values  obtained 
against  those  found  by  simulation. 
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The  following  two  graphs  give  an  example  of  the  results  we  achieved.  As  can  be  seen,  the 
agreement  between  the  simulation  and  the  calculation  is  perfect. 


The  first  graph,  derived  from  result  (7), 
enables  us  to  deduce  from  the  capacity  of 
the  multiplex  used,  the  probability  for  calls 
to  be  blocked.  With  a  multiplex  of  622 
Mb/s  (183  cells),  this  probability  can  be 
seen  to  be  negligible,  even  with  loads  of 
1843  E  per  TCA  on  incoming  PCMs,  as  in 
our  example  of  implementation. 

This  is  the  type  of  result  which  will  be 
used  to  dimension  the  system  (to  determine 
the  number  of  PCMs  that  can  be 
connected). 


The  second  graph  (derived  from  result  (8)), 
with  its  unusual  shape,  gives  the  cell  load  of 
the  links  inside  the  network  for  a  given  load  of 
PCMs  and  a  given  number  of  TCA  matrices  in 
use. '  The  particular  shape  actually  arises  as 
follows  :  as  the  number  of  directions 
(matrices)  grows,  the  number  of  cells  strictly 
required  and  the  number  of  additional 
(overflow)  cells  tend  to  increase.  However,  for 
certain  configurations,  the  cells  are  more  or 
less  filled  to  their  optimum  potential,  allowing 
the  same  number  of  cells  to  be  used  for 
different  numbers  of  directions  (matrices). 


From  these  two  graphs,  we  can  see  first  that  in  the  case  of  1843  E  offered  to  128  directions, 
the  probability  of  needing  more  than  150  cells  is  10'4  (Figure  4)  and,  as  a  consequence,  the 
probability  for  needing  more  than  1 83  cells  (call  blocking)  is  negligible,  and  second  that  in 
this  case,  the  mean  number  of  used  cells  is  138  (Figure  5). 

We  deduce  from  this  last  result  that  the  occupancy  rate  of  a  cell  is  equal  to  p  =  138/183  = 
0.754. 
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2.  ATM  sources  multiplexing  /  Blocking  for  the  Broad-Band  service 
connections 

In  this  second  part  we  shall  consider  the  saturation  probability  for  the  622  Mb/s  links  by  ATM 
sources  (terminals)  with  variable  rates  (VBR).  Our  research  looks  at  a  system  consisting  of  K 
different  types  of  sources,  numbered  Ni  ,  N2  ,  ••••,  Nk  ,  with  variable  rates.  The  operating 
principle  we  have  adopted  can  be  summarised  as  follows: 

Sources  connected  to  the  system  generate  calls  which  will  be  accepted  or  not,  depending 
on  their  traffic  characteristics  and  how  busy  the  multiplex  is. 

Indeed,  we  suppose  the  system  to  be  able  to  identify  at  any  moment  the  number  of  calls 
of  each  type  already  accepted  in  the  network,  on  each  link.  This  enables  us  to  know  at 
any  moment  the  statistical  characteristics  of  the  traffic  offered  to  the  multiplexes.  A  call 
of  type  j  will  then  be  accepted  if  there  is  sufficient  bandwidth  left  to  take  it,  otherwise  it 
will  be  rejected. 

A  counter  is  incremented  for  each  type  of  call  whenever  a  new  call  is  accepted,  and 
decremented  when  the  call  ends.  A  table  describes  the  combinations  m  n2,  n^  of  active 
sources  of  type  f,  t2,  R.  which  are  compatible  with  a  given  maximum  probability  of 
saturation  of  the  multiplex,  as  defined  below.  A  new  call  is  then  only  accepted  if  its 
characteristics  are  compatible  with  the  content  of  the  table. 


In  a  first  step,  for  each  type  of  sources,  we  calculate  the  probability  that  multiplex  will  be 
saturated  given  that,  n,  of  the  N|  sources  of  type  1  connected,  n2  of  N2  sources  of  type  2,  ...  n^ 
of  Nk  sources  of  type  K,  are  active.  Next,  we  establish  a  dimensioning  rule  which  allows  us  to 
determine  easily  the  number  of  sources  of  each  type  which  can  be  connected  to  the 
multiplexer  (Ni.,  N2 , .... ,  Nk). 


2. 1  Multiplex  saturation  probability 

2.1.1  Definition  of  sources 


Here  we  shall  consider  a  model  of 
sources  with  three  states  (figure  6) 
suggested  by  the  ATM  Forum  Technical 
Committee.  A  source  can  be  either 
ACTIVE  (making  a  call),  or  PASSIVE. 

When  it  is  active,  the  source  generates  a 
series  of  packets  or  bursts  (ON) 
separated  by  short  pauses  (OFF). 


Figure  6 
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Now  let  us  consider  sources  of  different  types,  along  the  lines  of  the  three-state  model  just 
described  but  with  transmission  rates  f,  which  differ  from  each  other.  In  particular  we  might 
consider  the  connection  of  Distributed  Computing  Environment  (DCE)  sources  at  2  Mb/s,  4 
Mb/s,  30  Mb/s  or  even  150  Mb/s.  These  different  frequencies  will  bring  about  variations  in 
the  values  of  Ton  and  Tactive  ■  We  should  note  that  this  model  applies  to  SBR-type  sources 
and  equally  to  CBR/DBR-type  sources,  though  there  the  ON  state  is  the  same  as  the  ACTIVE 
state. 

Since  the  multiplex  is  characterised  by  a  rate  f,  each  source  as  seen  by  the  multiplex  is 
characterised  by  D,  =  ^  time-slots  spaced  out  over  the  cells  that  constitute  a  burst  (i.e.  the 
multiplex  can  handle  £>  sources  of  type  i  at  the  same  time). 

We  will  assume  that  D  =  max  { D.  1  and  d  =  ^>max/Yi  . 

max  '  /D> 

Thus  the  multiplex  has  to  handle  traffic  from  N|  sources  of  type  1,  generating  calls  that 

occupy  di  units  of  the  bandwidth,  N2  sources  of  type  2,  generating  calls  that  take  up  d2  units,.. 

Nk  sources  generating  calls  that  take  up  dk  units  ;  the  total  number  of  bandwidth  units 

available  on  the  multiplex  being  given  by  D  . 

2.1.2  Blocking  probability  calculation 

Having  defined  the  model  of  sources,  we  can  now  calculate  the  probability  of  having  n 

sources  of  type  1,  r2  sources  of  type  2, . ,  r^  sources  of  type  k,  all  in  the  ON  state  when  we 

know  that  ni  ,  n2  ,  . ,  nk  sources  of  each  type  are  active.  When  only  one  type  of  source  is 

used,  this  probability  is  given  by  Engset's  Law  (we  are  only  considering  here  congestion  in 
time): 


_  C'MpONn(\-pON)(M'n)  Pon  (n)  is  the  probability  of  having  n 
”oN(n)~  n  ^  ^  (n<N)  sources  ON,  knowing  that  M  (9) 

Pon  0  Pon  )  are  active  and  that  N  (N<M) 

sources  at  most  can  be  ON  at  the 
same  time. 

In  the  event  of  k  different  types  of  sources  being  used,  the  probability  of  having  rj  ,  r2 ,  ....,  tk 
sources  ON,  when  we  know  that  ni  ,  n2  ,  ....,  nK  sources  of  each  type  are  active,  can  be 
expressed  as  previously  as  the  ratio  between  the  probability  of  favourable  cases  and  the 
probability  of  all  possible  cases. 


Thus  we  obtain: 


PqN  n,  ,n2  )  (fi  >  r2  >  •  ■  •  >  rk  )  ' 


(10) 


where  :  1 
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The  result  above  is  in  fact  simply  the  expression  of  Engset' s  generalised  law,  very  similar  to 
Erlang's  generalised  law  given  in  (ref.4),  which  can  be  easily  obtained  by  making  the  values  of 
n  (l<i<k)  tend  to  infinity. 

The  probability  of  having  r  units  of  bandwidth  engaged,  knowing  that  n]5  n2, nk  sources  are 
active,  is  then  expressed  as  : 

Pr=  Z  Pon  „r2,...,rk)  (M 

{E  ,r;  ,  .,rk )  / r,d,+r2</j  +  -•+'1 dk  ='') 


A  type  of  active  source  j  is  said  blocked  when  the  remaining  bandwidth  is  inadequate,  which 
is  the  equivalent  of  having  sources  rp  r2,  rk  all  ON,  such  that  : 

tndi>Dmax-dJ  «  Ir. (m 

.=i  /=i 

Therefore  if  PBj  is  the  probability  that  sources  of  type  j  are  blocked,  Pgj  is  then  expressed  as: 


P*.  =■ 


2  |  Z  n  cr‘PoNir' (i- PoNi)[ni~r,) 

r=Dma%-d !  +1  [  {  (x,x2,  .*t.)  /r,rf,  +r2d2+..+r„dk=r) 


j  I  r,dl+r2d2+...+rl:dt<,Dtm) 


tc  cr;p0Nir‘{\- PoNi)(n'~r,] 


03) 


Given  a  maximum  call  blocking  probability  value  (Pbloc  ),  for  each  type  of  source  i,  the 
formula  (13)  above  may  be  used  to  determine  all  the  combinations  (ni  ,  ,  ••••,  ihc)  °f  active 

sources  which  satisfy  the  relationship  Pb,  <  Pbloc  , .  The  intersection  of  the  sets  thus  defined 
for  each  type  of  source  makes  up  a  set  S  of  combinations  (n15  n2,  nk)  of  active  sources, 
such  that :  Vi,  PBl  <  PBloci. 


The  set  £  and  the  count  of  the  number  of  active  sources  will  form  the  basis  of  the  call 
acceptance  procedure  (sources  in  active  state). 

As  may  be  seen  thereafter,  it  is  easy  to  verify,  particularly  in  the  case  of  identical  ON-OFF 
sources,  that  our  result  (13)  gives  a  very  good  evaluation  of  the  "knee"  of  the  distribution  of 
cells  in  a  multiplexing  waiting  queue. 
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2.1.3  Practical  meaning  of  the  formula 

Let  us  consider  the  superposition  of 
ON-OFF  bursty  sources. 


It  is  widely  recognised  that,  in  a 
multiplexer  with  an  infinite  queue,  the 
probability  distribution  of  waiting 
cells,  P(>x),  can  be  divided  in  two 
parts  :  one  corresponding  to  the  cell 
component  and  the  other  to  the  burst 
component. 


As  demonstrated  in  (ref.  5  -  P.442),  the  cell  part  can  be  calculated  very  precisely  by  applying 
the  A,  *  D,  /  D  / 1)  formula  to  the  mean  bit  rate  and  also,  in  the  case  of  low  load,  a  good 
approximation  is  obtained  by  using  the  M/D/l  formula. 


The  burst  part  has  also  been  investigated  a  lot  in  literature  (Ref.  5,  6,  7,  8,  9).  For  this  part  of 
the  curve,  the  results  derived  in  many  studies  show  that  the  queue  length  probability 
distribution  essentially  depends  on  bursts  length.  In  the  case  of  short  bursts  the  slope  of  the 
curve  will  be  rather  high  and  therefore,  increasing  the  buffer  capacity  will  enable  us  to  limit 
loss  probability. 

However,  it  is  important  to  notice  that  if  bursts  are  long  enough  (which  is  the  more  general 
case),  the  slope  will  be  so  low  that  extremely  large  queues  would  be  necessary.  In  this  case, 
increasing  buffer  size  will  not  provide  significant  multiplexing  gain,  and  moreover  this  is  not 
compatible  with  services  with  real  time  constraints.  (  It  is  preferable  to  have  queues  with 
different  priorities  )  . 

It  seems  therefore  realistic  to  use  formula  (13)  to  determine  the  traffic  load  offered  to  the 
multiplexer  in  order  to  remain  just  before  the  "knee"  of  the  curve  (Point  K0),  in  the  cell  part. 

Indeed,  as  explained  in  (ref.5  -  P.448),  we  are,  in  this  part,  in  a  situation  where  the  probability 
of  an  overflow  of  the  capacity  of  the  multiplexer  rate  is  negligible  whereas,  in  the  burst 
region,  the  behaviour  of  the  queue  is  governed  by  the  fact  that  a  total  saturation  of  the 
multiplex  occurs  during  a  peak  period  :  the  probability  for  having  too  many  bursts 
simultaneously  active  is  high  compared  to  the  probability  of  congestion  by  cells. 


That  is  the  reason  why  the  formula  (13)  derived  to  calculate  the  probability  for  sources  to  be 
blocked  gives  a  very  good  evaluation  of  the  "knee"  of  the  distribution  of  cells  in  a 
multiplexing  queue  (Point  K0  -  figure  7).  The  accuracy  of  this  assertion  has  been  verified 
through  several  comparisons,  in  particular  for  homogeneous  ON-OFF  traffic  sources  as 
demonstrated  hereafter.  We  shall  now  call  it:  burst  level  call  blocking  probability. 
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Table  8  sums  up  the  results  of  our  comparisons  against  simulation  examples  found  in 
literature  or  achieved  internally. 


n  ,  Dmax,  Pon 

Simulation 
Value  1 

Calculated 

Call 

Congestion 

2 

Calculated 

Time 

Congestion 

[11  80,48,0.35 

2  10‘b 

1.81  10‘b 

2.94  10‘b 

[21  33,  16,0.375 

5  10’z 

5.25  10’2 

6.50  10" 

44,  16,  0.285 

6.5  10-r 

6.35  107 

7.26  107 

75,  16,0.166  1 

7  107 

6.75  107 

7.23  107 

[31  36,  12,0.1 

1  10‘4 

7.39  10 s 

9.98  10‘s 

36,  12,0.2 

3  107 

2.04  107 

2.46  107 

[41  100,15,0.0435 

2  10’5 

1.93  lO0 

2.17  10'5 

Table  8 


[1]  Information  technologies  and 
sciences.  COST  224 ,  p.185, 
1992. 


[2]  ANICK  D,  MITRA  D. 
SONDHI  M  .M.  Stochastic 
Theory  of  a  Data-Handling 
System  with  Multiple  Sources. 
The  BELL  Technical  Journal, 
Vol.61,  N°8,  pp  1871-1894, 
Dec.  1981. 


[3]  Internal  Simulations 


1  •  Due  to  the  fact  that  we  only  found  curves  of 
simulations,  values  given  in  this  column  are 
approximate  ones. 

2  *  Call  congestion  is  obtained  in  substituting  n,-l  to  n,  in 
formula  (13) 

'  *  Time  congestion  has  been  derived  from  formula  (13) 


[4]  YANG  T„  TSANG  D.  H.  K. 
A  Novel  Approach  to  Estimating 
the  Cell  Loss  Probability  in  an 
ATM  Multiplexer  Loaded  with 
Homoge-neous  ON-OFF 

Sources.  IEEE  Transactions  on 
COMMUNICA  -  TIONS,  Vol.  43. 
N°l,  pp  117-126,  Jan.  1995. 


As  may  be  seen  in  table  8,  in  the  case  of  homogeneous  sources,  the  agreement  between  our 
calculations  and  the  simulation  examples  we  found  is  excellent. 


Moreover,  we  have  noticed  the  same  agreement  in  the  case  of  heterogeneous  sources.  As  may 
be  seen  below,  we  have  achieved  several  calculations  for  a  mix  of  two  classes  at  a  time  (50% 
Type  1,  50%  Type  2).  Figure  9  deals  with  the  comparison  between  the  probability  of  total  time 
congestion  (calculated  from  the  probabilities  of  congestion  obtained  for  each  type  of  sources  - 
cf  table  below)  and  simulation  results  found  in  literature.  It  is  interesting  to  note  that  the 
accuracy  of  our  formula  remains  excellent  even  in  case  of  low  load. 
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Time  congestion  for  offered  trafic 


-o  (5)  Calculated  Proba(Time  Congestion) 

(  Formula  13  ) 

Figure  9 


[5]  BAIOCCHI  A.,  BLEFARI- 
MELAZZI  N.,  R0VER1  A., 
SALVATORE  F.  Stochastic 
Fluid  Analysis  of  an  ATM 
Multiplexer  Loaded  With 
Fleterogeneous  ON-OFF  Sources 
:  an  Effective  Computational 
Approach. 

INFOCOM  92,  pp  405-414, 
1992. 


Offered 

Traffic 

A 

[5] 

PB, 

Formula(  13) 

0,3 

IE-05 

9.4E-06 

0,4 

0,0003 

0,000235 

0,5 

0,002 

0,00175 

0,6 

0,006 

0,00468 

0,7 

0,02 

0,0144 

0,8 

0,04 

0,0334 

0,9 

0,06 

0,0519 

[5]  -> Fluid  Flow  Model 
Two  types  of  sources. 

Type  1  :  PONi  =  0.1,  D,  =  7 
Type  2:  POn2=0.5,  D2  =  23 


Dmax  -  23, 
di  =  3,  d2  =  1 
A  =  offered  traffic 


A 

0,3 

0.4 

t/~> 

o' 

0,6 

0,7 

0,8 

0,9 

nl 

8 

11 

14 

16 

19 

22 

24 

n2 

8 

11 

14 

16 

19 

22 

24 

As  a  consequence  of  the  way  of  dimensioning  we  recommended  (at  point  Ko),  the  traffic 
accepted  on  the  network  links  can  be  modelled  as  a  geometric  one.  Thus,  simple  formulae  for 
the  dimensioning  of  the  queues  within  the  network  matrices  may  be  used  and,  most  important, 
small  buffers  are  sufficient. 


2.2  Call  acceptance  and  System  dimensioning 

♦  Call  acceptance  is  based  on  the  use  of  formula  (13),  which  allows  to  determine  the  set  £  of 
acceptable  combinations  (ni ,  n2 ,  ...,  nK)  of  active  sources  such  that  the  probability  of 
saturation  of  the  multiplex  is  below  a  predetermined  limit. 

The  expansion  within  the  Clos  network  ensures  to  be  able  to  establish  a  path  on  the  basis  of 
the  peak  rate,  as  far  as  the  sum  of  the  peak  rates  offered  to  an  entry  matrix  is  less  than  a 
maximum  value.  This  value  and  the  expansion  required  may  be  determined  by  formulas  such 
as  presented  in  (ref.  10).  Under  this  constraint  the  core  of  the  network  is  strictly  non  blocking. 
If  we  accept  a  negligible  call  rejecting  probability  in  the  ATM  core  it  is  furthermore  possible 
to  reduce  the  expansion,  and  even  to  increase  the  multiplexing  gain  as  follows:  a  path  is 
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established  within  the  network  by  testing  on  each  link  that  the  new  call  is  compatible  with  the 
set  of  acceptable  combinations  assuring  the  maximum  saturation  probability  allowed.  In  both 
cases  the  control  leads  to  remain  below  the  "knee"  thus  allowing  short  buffers  and  simple 
calculations  for  the  size  of  the  queue. 

♦  Dimensioning  the  system,  i.e.  determining  the  number  of  sources  of  each  type  that  can  be 
connected  to  the  multiplex,  is  based  on  the  blocking  probability  allowed  for  each  type  of 
sources. 

Having  calculated  the  set  £  of  acceptable  combinations  (nj  ,  n2  ,  ...,  nK)  of  active  sources,  we 
can  now  determine  the  set  of  combinations  (Ni  ,  N2 ,  ....,  Nk)  of  sources  connected,  such  that 
the  probability  of  obtaining  an  unacceptable  combination  (  not  forming  part  of  £  )  (ni ,  n2 ,  ..., 
nK)  active  sources  falls  below  a  pre-determined  limit.  Once  again,  this  probability  is  given  by 
Engset's  Law  because  the  number  of  sources  which  can  be  active  at  the  same  time  is  limited. 
Thus,  if  we  give  the  name  E  to  the  set  of  combinations  which  form  the  boundary  of  the  set  £ 
of  permissible  combinations,  we  obtain  : 


(14) 


Proba.  of  refusing  a  call  = 


The  system  can  then  be  dimensioned  according  to  the  following  algorithm: 

©  For  each  type  of  sources  i,  the  active  sources  configurations  (ni  ,  n2  ,  ...,  nk)  are  determined 
such  that  the  burst  level  call  blocking  probability  P  is  less  than  the  predetermined  value 

Pbioci ,  10-7  for  example.  (Use  of  formula  (13)). 

©  From  the  sets  of  combinations  derived  for  each  source,  the  set  £  of  acceptable 
combinations  is  determined,  together  with  its  upper  limit  E. 

®  Knowing  E  ,  the  set  of  combinations  (Ni  ,  N2  ,  ....,  Nk)  of  connectable  sources  is 
determined,  such  that  the  probability  of  calls  being  refused,  calculated  from  the  formula  (14), 
falls  below  a  predetermined  level  (10"^  for  example). 


2.3  Implementation 

If  on  the  one  hand  calculations  seem  to  be  complex,  on  the  other  hand  the  implementation  of 
calls  acceptance  procedures  is  really  simple. 

♦  A  call  counter  is  incremented  as  each  call  arrives,  and  decremented  when  it  ends.  The  state 
of  the  counter  (m,  n2,  ...,  nk)  is  then  compared  with  the  contents  of  the  set  £  of  permissible 
combinations,  and  sources  leading  to  a  combination  (ni,  n2,  ...,  nK)  not  belonging  to  £  are 
rejected  (comparison  with  an  engineering  table  defining  the  boundary  E  of  £  (cf.  below)).  The 
use  of  those  predetermined  tables  seems  to  be  an  efficient  solution  taking  into  account  the 
relative  complexity  of  the  formulas. 


Study  of  the  performance  of  an  ATM  CLOS  switching  network 


395 


Example 

Let  us  take  the  system  with  two  types 
of  source  as  in  this  diagram: 

Each  type  of  source  is  characterised 
by  its  rate,  and  follows  the  three-state 
model,  described  above. 

We  apply  the  algorithm  previously 
defined: 

Firstly,  we  calculate  the  burst  level 
call  blocking  probability  (formula 
(13))  for  each  type  of  source,  taking 
(ni,  n2)  sources  to  be  active.  We  then 
draw  the  graph  opposite  (Figure  11), 
showing  limiting  curves  for  the 
permissible  configurations  compatible 
with  the  burst  blocking  probabilities, 
for  each  type  of  source. 

As  expected,  we  find  that  it  is  type-2 
sources  which  impose  the  severest 
constraints  (so  this  result  suggests 
that  we  should  accept  a  higher 
blocking  probability  for  sources  with 
a  high  rate  (type  2)) . 

The  set  E,  including  the  new  call,  is 
then  given  by  the  table  of  limit 
combinations  (ni,  n2)  of  active 
sources  such  that  the  burst  level  call 
blocking  probability  for  the  two  types 
of  sources  falls  below  10’  : 


nl 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

n2 

4 

3 

3 

3 

3 

3 

2 

2 

2 

2 

nl 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

n2 

2 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 

Having  determined  this  set,  we  then 
consider  the  dimensioning  of  the 
system.  We  therefore  apply  the 
formula  previously  established 
(formula  (14))  to  various 
combinations  (Nh  N2)  of  connected 
sources,  and  note  the  maximum 
combinations  which  give  a  calculated 
call  rejecting  probability  lower  than 
10'3  (probability  of  obtaining  the  set 
E  -  Figure  12). 


Admissible  combination  set 

Burst  Level  Call  Blocking  Proba.  (K0)=  10  E-7 


E1«(n1,n2)/BK10E-7  — » —  E2-(n1,n2)/B2<10E- 7 


Figure  11 


Acceptable  (N1.N2)  combinations 
Call  Blocking  Probability  =  10  E-3 


Figure  12 


396 


Part  Eight  Performance  Modelling  Studies 


We  can  therefore  conclude,  for  example,  that  with  a  configuration  of  five  type-1  sources  and 
two  type-2  sources,  the  probability  that  a  given  source  is  prevented  from  transmitting  because 
probability  of  overload  at  the  multiplex  is  greater  than  10"7  ,  is  less  than  10'^. 

It  is  furthermore  interesting  to  notice  that,  for  low  blocking  probability,  there  is  a  significant 
multiplexing  gain  only  if  the  ratio  of  the  multiplex  rate  over  the  source  rate  is  large  enough. 
Otherwise,  as  it  is  the  case  in  our  example,  it  is  nearly  equivalent  to  dimension  on  the  basis  of 
the  peak  rate. 


3.  Evaluation  of  the  CDV 


In  this  last  section,  we  shall  calculate  the  time  taken  for  cells  to  cross  the  network,  giving  us 
an  upper  limiting  value  for  the  Cell  Delay  Variation  (CDV).  The  ATM  core  of  the  network  is 
a  three-stage  Clos  structure,  with  expansion.  The  diagram  below  shows  the  configuration 
adopted. 


Figure  13 


At  each  stage,  there  may  be  a  delay  as  cells  from  different  directions  may  wait  for  access  to  an 
outgoing  direction.  Taking  the  usual  hypothesis  of  independence  between  stages,  the 
distribution  of  the  total  delay  obeys  the  product  of  convolution  of  the  distributions  of  delay  at 
each  of  the  three  stages.  Because  there  is  a  great  deal  of  mixing  of  flows  in  the  network  and  in 
accordance  with  what  has  already  been  pointed  out  in  section  2.1.3  and  observed  by  many 
authors  (Ref.  11),  we  can  assume  that  the  flows  within  the  network  follow  a  Poisson 
distribution.  This  therefore  entails  deriving  the  product  of  convolution  of  three  queues  M/D/1. 
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This  is  an  easy  process,  using  the  approximate  formula  below  (ref.  5)  which  is  very  accurate 
even  for  low  values  of  p: 


P(>  x)  =  --1 — ^e_(l_Mnlp))’‘  (tS) 

ln(p) 

The  product  of  convolution  for  three  queues  such  that  :Pj  (=  x)  =  aie~a'x  (i  =  1,2,3) 
is  easily  obtained  from  the  Laplace  Transform  : 


pw\s)  =  pt\S)  .p2\s).p;(s)= 


a,a2a3 

(s  +  a,  )($  +  a2  )(s  +  a3) 


06) 


From  formula  (16)  we  can  deduce  :  P[T)(>  x)  =  Aje  a,x  +  K2e  a-x  +  K}e  a,x 


where  :  K, 
K  = 


a,  =  -a 


afa2  -a,)(a3  -  a,)  ’ 
a]a2ai 

a3(a] -a3)(a2  -a}) 

1  -A 


'  ln(A) 


k2 

o,  = 


a]a2ai 

a2(at  -a2)(a3  -a2) 
1-  Pi  ~  ln(  A )  and 


(17) 


NUMERICAL  APPLICATION 


From  the  results  in  the  sections  above,  we  can  consider  a  maximum  cell  load  of  0.9  on  the 
ATM  links  coming  into  the  network.  This  value  is  a  conservative  one  with  respect  to  the  value 
obtained  in  section  1.3.  Given  for  instance  an  internal  structure  with  an  expansion  of  three, 
the  values  of  p  to  be  allowed  for  at  each  stage  are  :  pi  =  0.3,  p2  =  0.3,p3  =  0.9. 

This  then  gives,  using  (17),  P(3)  (>x)  =  10' 10  for  x=l  1 1  cells  in  the  system  (a  value  due  to  the 
preponderance  of  the  third  stage).  The  maximum  CDV  is  therefore  76  ps  (111  x  682  nsec), 
which  is  fully  compatible  with  the  real-time  constraints  of  64  Kbit/s  services,  the  ATM  bursty 
traffics  being  as  for  them  penalised  at  call  acceptance  level  and  blocking  at  the  input  of  the 
network.  As  a  consequence,  CDV  can  not  be  considered  in  this  case  as  a  real  constraint,  call 
blocking  probability  remaining  the  preponderant  factor  for  the  network  dimensioning. 

Furthermore  it  is  easy  to  verify  that  even  with  a  load  of  0.9  at  each  stage  the  maximum  CDV 
is  the  same  with  a  probability  of  3.1  O'10  That  means  that  under  the  constraint  of  a  negligible 
internal  call  rejecting  probability,  expansion  is  even  no  necessary.  This  is  particularly 
interesting  for  matrices  connecting  narrow-band  services  for  which  it  is  easy  to  get  very  low 
internal  blocking  probability  without  any  expansion. 
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CONCLUSION 


In  this  study  we  have  evaluated  the  performance  of  a  switching  matrix  based  on  the  ATM 
composite  technique.  We  have  established  the  formulae  which  enable  the  network  to  be 
dimensioned  for  64  Kbit/s  services,  and  also  for  Broad-Band  services. 

The  main  contributions  of  this  paper  are  thus,  the  application  of  Brockmeyer's  work  on 
overflow  systems,  which  enabled  us  to  give  the  exact  distribution  of  the  number  of  cells 
required  and  the  loss  probability  for  64  Kbit/s  services,  and  second,  the  determination  of  a 
very  accurate  formula  for  the  calculation  of  the  blocking  probability  for  Broad-Band  services 
which  yields  quite  a  good  network  dimensioning  rule  and  accurate  call  acceptance  algorithms. 

From  the  point  of  view  of  traffic  flow,  the  results  obtained  show  the  efficiency  of  a  Clos  type 
structured  network.  This  kind  of  network  without  any  blocking  ,or  with  a  negligible  one,  at  the 
VC  and  VP  levels,  is  entirely  effective  and,  furthermore,  small  capacities  of  the  queues 
reserved  to  each  elementary  switching  matrix  are  sufficient  to  guarantee  a  good  service  quality 
(crossing  delay  and  loss  probability). 

We  therefore  consider  that  the  formulae  established  can  serve  as  a  basis  for  drawing  up  ATM 
traffic  control  procedures.  Indeed,  traffic  characteristics  such  as  peak  and  mean  bit  rates 
combined  with  enumeration  systems  of  calls  provide  the  means  of  definition  of  engineering 
tables  and  call  acceptance  rules  which  enable  to  guarantee  a  good  quality  of  service. 

In  a  future  work  we  shall  study  the  dimensioning  of  the  network  when  allowing  very  low  call 
rejection  probability  within  the  ATM  core  (quasi  nonblocking  network  instead  of  strictly 
nonblocking  network),  while  maintaining  negligible  multiplex  saturation  probability,  such  as 
suggested  in  section  2.2. 
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Abstract 

Fast  Reservation  Protocol  (FRP)  is  a  Traffic  Control  scheme  intended  to  multiplex  bursty 
data  sources.  In  this  paper  we  focus  on  the  analysis  of  the  FRP  when  different  sources 
are  multiplexed  together  in  order  to  study  the  fairness  of  the  protocol.  We  present  two 
analytical  models  to  analyse  the  case  in  which  a  set  of  identical  sources  is  multiplexed 
with  another  one  of  higher  rate.  Analytical  results  are  compared  with  simulation  results. 


1  INTRODUCTION 

In  order  to  efficiently  multiplex  data  transfers  and  LAN-LAN  interconnection  on  the 
ATM  B-ISDN,  an  in-call  bandwidth  negotiation  called  Fast  Reservation  Protocol  (FRP) 
has  been  proposed,  Boyer  (1992).  FRP  is  a  kind  of  Connection  Acceptance  Control  at 
burst  level,  that  is,  when  a  source  wants  to  transmit  a  burst  it  is  accepted  or  blocked 
depending  on  the  available  bandwidth  within  the  link.  When  a  burst  is  blocked  successive 
reattempts  are  made  until  it  is  accepted.  Although  the  FRP  it  is  not  a  new  proposal,  it  is 
still  a  hot  topic  because  recently  it  has  been  included  in  the  ITU-T  371  recommendation 
to  support  the  ATM  Block  Transfer  Capability. 

Performance  of  an  FRP  connection  is  therefore  measured  in  terms  of  its  Burst  Blocking 
Probability  (BP)  and  its  Blocking  Time  (BT,  i.e.  the  time  that  a  blocked  burst  has  to 
wait  until  it  is  eventually  accepted).  Performance  studies  of  FRP  and  related  protocols 
have  been  carried  out  by  several  authors,  Boyer  (1992),  Enssle  (1994),  Suzuki  (1992), 
Bernstein  (1994).  In  those  studies  however,  a  set  of  identical  sources  is  used  to  model  the 
protocol  behaviour.  When  sources  with  different  parameters  (PCR  and/or  burst  duration) 
are  multiplexed  together,  it  is  foreseeable  that  each  source  type  will  get  a  different  BP 
and  BT.  In  this  paper  we  focus  on  the  analysis  of  the  FRP  fairness  when  different  sources 
are  multiplexed.  We  use  the  term  fairness  in  the  sense  of  discrepancy  between  BP  and 
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BT  values  of  different  source  types.  Being  all  equal,  the  network  would  have  a  fair  burst 
access. 

We  assume  an  ON-OFF  model  for  the  data  sources  with  exponential  ON  and  OFF 
time  distribution  (burst-silence  model).  In  order  to  assess  the  burst  blocking  probability 
of  the  sources,  two  approximations  of  the  protocol  are  considered.  In  the  first  approach  we 
assume  that  the  time  between  reattempts  is  zero.  With  different  types  of  sources  this  case 
leads  to  a  Markov  chain  that  does  not  have  a  product  form  solution,  so  we  analyze  the 
simple  situation  in  which  a  set  of  identical  sources  are  multiplexed  with  another  source 
of  a  higher  rate. 

In  a  second  approach  we  consider  that  the  reattempt  time  and  OFF  time  are  identically 
distributed.  This  assumption  leads  to  a  Markov  chain  with  a  simple  product  form  solution 
even  when  considering  different  source  types. 

In  the  first  approach  the  time  that  a  burst  has  to  wait  when  it  is  blocked  until  it  is 
accepted  is  also  evaluated.  Analytical  results  are  compared  with  simulation  results. 


2  OVERVIEW  OF  THE  FRP  PROTOCOL 

The  FRP  is  described  in  Boyer  (1992).  Two  variants  of  the  protocol  have  been  proposed. 
The  first,  called  FRP  with  Delayed  Transmission  (FRP/DT),  is  intended  to  multiplex 
the  so  called  Stepwise  Variable  Bit  Rate  Sources.  These  sources  are  expected  to  have 
a  stepwise  need  of  bandwidth.  However  there  is  a  restriction  on  the  sources  which  must 
tolerate  a  delay  in  the  negotiation  of  an  increase  of  bandwidth.  Many  data  communications 
are  typical  examples  of  such  sources. 

Basically  the  FRP/DT  works  as  follows.  When  a  source  wants  an  increase  of  bandwidth 
(for  example,  when  it  wants  to  transfer  a  burst),  it  sends  a  Request  to  the  so  called  FRP 
Control  Unit,  situated  at  the  ingress  node.  This  Request  is  forwarded  to  the  first  switching 
element  of  the  link,  which  checks  whether  it  can  allocate  the  increase  of  bandwidth  or  not. 
If  it  has  enough  bandwidth,  the  Request  is  forwarded  to  the  next  switching  element  and 
so  on  until  it  reaches  the  egress  node.  Eventually  the  egress  node  will  send  an  acknowl¬ 
edgment  back  to  the  FRP  Unit  and  the  source  will  be  allowed  to  transfer  the  burst.  The 
time  passed  from  the  FRP  Unit  sending  the  request  until  receiving  the  acknowledgment 
is  called  the  Round  Trip  Time.  Note  that  during  this  time  the  switching  elements  have 
allocated  bandwidth  for  the  source,  but  the  transmission  has  not  started  yet.  Therefore 
this  time  is  an  overhead  introduced  by  the  protocol. 

If  a  switching  element  is  not  able  to  allocate  the  requested  increase  of  bandwidth,  it 
discards  the  Request,  and  by  a  time-out  mechanism  the  allocated  resources  are  reset  to 
their  previous  state.  In  this  case  the  FRP  Unit  makes  successive  reattempts  until  the 
increase  of  bandwidth  is  accepted.  The  source  indicates  the  FRP  Unit  when  an  accepted 
burst  is  already  transferred  in  order  to  release  the  allocated  bandwidth. 

The  other  variant,  called  FRP  with  immediate  transmission  (FRT/IT),  is  intended 
for  sources  more  sensitive  to  a  time  delay.  In  this  case  the  source  transfers  the  burst 
immediately  after  the  reservation  request.  If  the  reservation  fails  in  any  of  the  nodes,  the 
whole  burst  is  discarded. 
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3  MODEL  DESCRIPTION  AND  ANALYSIS 

In  our  analysis  we  consider  an  isolated  node.  We  assume  an  ON-OFF  model  for  the 
data  sources  with  exponential  ON  and  OFF  time  distribution  (burst-silence  model).  The 
parameters  of  the  sources  are  the  bitrate  within  a  burst  period  A;  the  mean  burst  duration 
ton  and  the  mean  silence  duration  to}f 

In  the  model  Nt  identical  sources  (we  will  refer  to  them  as  Itype  sources)  with  parameters 
A/,  if1  and  t°U ,  are  multiplexed  with  another  different  source  (we  will  refer  to  it  as  htype 
source)  with  parameters  A h,  tff  and  t°J  . 

Being  all  time  intervals  exponentially  distributed,  the  activation  rate  a  of  a  source  is 
given  by  a  =  Let  the  service  time  be  the  time  that  a  node  allocates  bandwidth  for 

a  non  blocked  source.  Clearly,  for  the  FRP/IT  the  mean  service  time  is  the  mean  burst 
duration  ton.  For  the  FRP/DT  a  non  blocked  source  has  to  wait  a  deterministic  time  equal 
to  the  round  trip  time  trt  before  transferring  a  burst,  so  the  mean  service  time  is  given 
by  ton  +  tr{.  However,  in  this  paper  we  do  not  study  the  influence  of  the  round  trip  time 
time,  so  we  will  assume  it  to  be  zero.  Therefore  the  service  rate  p  of  a  source  is  given  by 
p  =  l/ton.  Assuming  trt  =  0  our  model  makes  no  distinction  between  the  FRP/DT  and 
FRP/IT.  Refer  to  Enssle  (1994)  for  a  contrast  of  both  variants  of  the  protocol. 

3.1  Approximation  by  zero  time  between  reattempts 

In  this  approximation  we  suppose  that  when  a  burst  is  blocked,  the  time  between  the 
successive  requests  that  are  made  until  the  burst  is  accepted  is  zero.  This  is  equivalent  to 
considering  a  blocked  burst  being  kept  in  a  queue  until  there  is  enough  bandwidth  left  by 
the  other  sources  in  the  link. 

Let  AT  be  the  maximum  number  of  Itype  sources  that  can  be  simultaneously  transferring 
a  burst  without  exceeding  the  link  capacity,  when  the  htype  source  is  also  transferring  a 
burst.  Let  AT  be  the  same,  but  when  the  htype  source  is  silent  or  blocked.  Let  us  further 
suppose  that  the  htype  source  transmits  at  a  higher  rate  than  the  Itype  source  such  that 
AT  >  Kl  +  1.  In  this  case  when  an  Itype  and  htype  sources  are  blocked,  the  Itype  source 
will  be  accepted  first  (i.e.  the  htype  source  does  not  see  a  FIFO  queue).  Clearly,  if  the 
link  capacity  is  C 


AT  = 


C  —  Ah 
A ,  . 


(1) 


A T 


C_ 

-A;. 


(2) 


With  these  assumptions  an  isolated  node  can  be  described  by  the  Markov  chain  of 
figure  1  with  state  space  {(i,j)  :  i  =  0, 1, 2  ;  0  <  j  <  A/},  where  j  is  the  number 

of  Itype  active  sources  (transferring  or  blocked)  while  the  htype  source  is  silent  ( i  =  0), 
transferring  a  burst  (i  —  1)  or  blocked  (i  =  2).  This  Markov  chain  does  not  have  a  product 
form  solution  for  the  stationary  probabilities  Trt],  so  they  have  to  be  calculated  numerically 
solving  the  global  balance  equations. 

The  Itype  and  htype  source  blocking  probability  (P;  and  Ph)  can  be  obtained  from  the 
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stationary  probabilities  7 r^.  The  blocking  probability  is  given  by  the  probability  that  an 
arriving  burst  is  blocked,  divided  by  the  probability  of  a  burst  arrival.  Thus 


Pk 


N , 

*oj 

j=K  l  +  l 

N, 

J=0 


(3) 


Ni-l  Nt- 1  Ni- 1 

J2  iNl~  j)*0j  +  +  E  (N‘  -  h)^2j 

P,  =  >=*» _ tffl _ t!Sl -  (4) 

ri  Nt-1  N,-l  Ni-1 

J2  W _  i)7ri;  +  H  (Ni~  iVn 

j= 0  i=o  i=/ti+i 

3.2  Approximation  by  identically  reattempt  and  OFF  time 
distribution 

In  this  approximation  we  assume  that  when  a  burst  is  blocked,  the  time  between  the 
successive  requests  that  are  made  until  the  burst  is  accepted  is  exponentially  distributed 
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with  a  mean  equal  to  the  OFF  time  distribution,  i.e.  we  assume  an  identically  reattempt 
and  OFF  time  distribution.  This  is  equivalent  to  considering  that  a  blocked  burst  is  lost. 

Let  AT  and  AT  be  the  same  as  in  the  previous  section.  Because  a  blocked  burst  can  be 
considered  as  lost,  with  this  approach  an  isolated  node  can  be  described  by  the  Markov 
chain  with  state  space  {(?',  j)  :  i  =  0, 1  ;  0  <  j <  AT}  of  figure  2,  where  j  is  the  number 
of  ltype  sources  transferring  a  burst  while  the  htype  source  is  silent  (i  =  0)  or  transferring 
a  burst  (i  =  1).  The  stationary  probabilities  7rtJ  of  the  Markov  chain  has  a  straightforward 
product  form  solution  given  by 


2  7 


G 


(5) 


where  G  is  the  normalization  constant,  pi  =  pi/ an  and  ph  =  ph/ah-  We  note  that  consid¬ 
ering  more  than  one  htype  source  or  even  considering  more  than  two  types  of  sources,  a 
product  form  solution  would  still  apply. 

In  this  model  we  make  no  distinction  between  a  burst  or  a  reattempt  arrival.  So  we 
calculate  the  blocking  probability  as  the  probability  that  a  burst  or  a  reattempt  arrival 
is  blocked,  divided  by  the  probability  of  a  burst  or  a  reattempt  arrival.  Such  blocking 
probability  for  the  ltype  and  htype  sources  (Pi  and  Ph)  is  given  by 


5Z  ^oj 

n  J=I<1+1 

Fh  =  ~Y2 - 

X)  7r°j 

3=0 


(6) 


(TV;  —  A  2 )  ^ok2  +  (Ni  —  Ai)irljri 

t  l  "  K2  I< ! 

J2(Nl  -  ;>0i  +  ^(Ni-  j)TTij 

3= 0  3=0 


(7) 


Note  that  in  the  previous  section  we  do  not  count  the  reattempts  to  calculate  the  blocking 
probability  (considering  a  zero  time  between  reattempts  implies  considering  oo  reattempts 
after  a  blocked  burst).  If  the  reattempt  time  is  not  zero,  the  following  relation  applies  for 
the  P/n,t  and  Pfotal  blocking  probabilities  of  an  ltype  source,  calculated  counting  and  not 
counting  the  reattempts  respectively.  Let  r;  be  the  mean  number  of  reattempts  that  a 
blocked  burst  of  an  ltype  source  do  until  it  is  accepted.  It  can  be  derived  that 


pinit 


ptotal 


ri  (1 


ptotal  j 


(8) 


Obviously,  an  analogous  relation  holds  for  the  htype  source.  If  the  blocking  probability  is 
small  and  the  reattempt  time  is  high  enough  such  r  ~  1  (i.e.  a  blocked  burst  is  almost 
always  accepted  at  the  first  reattempt),  Pm,i  ss  ptotal  'pjjggg  conditions  are  foreseeable 
in  the  approximation  by  identically  reattempt  and  OFF  time  distribution.  So,  this  ap¬ 
proximation  can  be  used  to  asses  P™*  and  P(mlt  from  the  probabilities  calculated  with 
equations  6  and  7. 
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Figure  2  State-transition  diagram  assuming  identically  reattempt  and  OFF  time  distri¬ 
bution 


3.3  Blocking  time  in  the  approximation  by  zero  time 
between  reattempts 

In  this  section  we  calculate  the  time  that  an  arriving  burst  that  is  blocked  has  to  wait 
until  it  is  eventually  accepted  (we  refer  to  it  as  blocking  time).  We  calculate  this  time 
assuming  the  approximation  by  zero  time  between  reattempts,  so  the  referred  states  are 
those  of  figure  1.  We  do  not  use  the  approximation  by  identically  reattempt  and  OFF 
time  distribution  to  assess  the  blocking  time,  because  in  general  it  would  be  inaccurate. 

Let  Th  and  T,  be  the  blocking  time  of  an  htype  source  and  ltype  source  respectively. 
Let  BtJ  =  (■ i,j )  be  entering  state  resulting  from  the  blocking  transition.  Clearly 


Nt 


P(Th<x)=  Y.  P(Th<x\B2j)P(B2j), 

j=Ki+l 

(9) 

rah,)-  „,0' 

(10) 

Y  7r°fc 


k=I\  i+l 


and 

P{T,  <x)=J2  P(T‘  ^  *1  Bij)P{Bij) , 

VB,j 


0,0  1 _ (N,-j  +  1)^1 _ 

Xy&ij)  Ni-l  Ni-l  N[-l 

Y  ( Ni  -  k)nok  +  Y2  (Ni  —  k) rrlk  +  J2  ( TV,  -  k)ir2k 


k=K2 


k=K  i 


k—K.2 


(12) 


P(Th  <  x\Bij)  is  the  distribution  of  the  time  that  a  blocked  burst  of  an  htype  source 
has  to  wait  until  it  is  accepted,  when  the  entering  state  in  the  blocking  transition  is  Bt] . 
P(Ti  <  x\B2j)  is  the  same  for  an  ltype  source.  Formulas  for  this  probabilities  are  derived 

in  appendixes  1  and  2. 
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4  RESULTS 

In  this  section  we  present  a  numerical  study  of  the  FRP  fairness  using  the  models  described 
above.  We  evaluate  the  fairness  of  the  protocol  in  terms  of  the  burst  blocking  probability 
and  the  mean  blocking  time.  Blocking  time  is  specially  important  using  the  FRP/IT 
scheme  in  which  the  sources  are  supposed  to  be  time  sensitive.  We  also  compare  analytical 
and  simulation  results. 

Figures  3  and  4  (model  parameters  are  summarized  in  Table  2)  plot  the  blocking  prob¬ 
ability  and  the  mean  blocking  time  of  the  two  source  types  considered,  when  the  htype 
source  varies  the  mean  burst  duration  (i.e.  the  mean  ON  time  Pf1)  L  Varying  t™  from  0 
(the  source  is  always  silent)  to  oo  (the  source  is  always  active),  blocking  probability  of 
the  ltype  sources  will  increase  between  the  one  obtained  when  sharing  a  link  of  capacity 
varying  from  C  to  C  —  A/,.  Figure  3  shows  that  the  blocking  probability  of  the  ltype 
sources  increases  within  these  limits,  while  the  blocking  probability  of  the  htype  source 
remains  constant.  The  blocking  probability  is  assessed  using  the  approximation  by  zero 
time  between  reattempts  (section  3.1)  +,  and  the  approximation  by  identical  reattempt 
and  OFF  time  distribution  (section  3.2). 

Figures  5  and  6  plot  the  blocking  probability  and  the  mean  blocking  time  of  the  two 
source  types,  when  the  htype  source  varies  the  bitrate  within  a  burst  period.  Each  time 
that  the  htype  source  bitrate  reaches  a  multiple  of  the  ltype  source  bitrate,  there  is  a 
decrement  on  the  maximum  number  of  sources  that  can  be  simultaneously  transferring  a 
burst.  This  causes  an  increasing  step  on  the  blocking  probability  and  the  blocking  time. 

Table  1  compares  analytical  and  simulation  results  (given  with  95%  confidence  inter¬ 
vals).  To  calculate  the  blocking  probabilities  in  the  simulation,  the  reattempts  have  been 
not  counted  in  order  to  compare  with  the  approximation  by  zero  time  between  reattempts 
(these  probabilities  are  referred  to  as  “init.”  in  the  table),  and  have  been  counted  to  com¬ 
pare  with  the  approximation  by  identically  reattempt  and  OFF  time  dist.  (referred  to  as 
“tot.”  in  the  table,  cfr.  section  3.2).  Increasing  the  reattempt  time  decreases  the  block¬ 
ing  probability.  So,  the  first  approximation  can  be  considered  as  an  upper  bound  for  the 
“init.”  probabilities,  and,  for  a  reattempt  time  lower  than  the  mean  OFF  time,  the  second 
approximation  can  be  considered  as  a  lower  bound  for  the  “tot.”  probabilities. 

A  deterministic  and  an  exponentially  distributed  reattempt  time  has  been  considered 
in  the  simulation.  It  can  be  seen  that  the  exponentially  distributed  approximation  for  the 
reattempt  time  gives  accurate  results  for  the  blocking  probabilities,  but  the  blocking  time. 
Simulation  results  show  that  the  mean  blocking  time  increases  rapidly  with  increasing  the 
reattempt  time. 


5  CONCLUSIONS 

We  have  analyzed  the  behaviour  of  the  FRP  when  different  source  types  are  multiplexed 
together.  We  have  considered  the  case  in  which  a  set  of  identical  sources  is  multiplexed 
with  another  one  of  higher  bitrate.  To  assess  the  blocking  probability  we  have  considered 

tWe  note  that  the  burstiness,  defined  as  6  =  (ton  + t°ss  )/ton  is  a  decreasing  function  with  increasing  ton . 
+  To  calculate  the  stationary  probabilities  using  this  approximation,  we  have  solved  the  global  balance 
equations  using  a  Gaussian  elimination  method 
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Table  1  Comparison  of  analytical  and  simulation  results 


Analytical 

Simulation 

Zero  time 

between 

reatt. 

Id.  reatt. 

and  OFF 
time  dist. 

Reattempt  time 

5 

ms 

20 

ms 

50 

ms 

Exp.  dist. 

Det. 

Exp.  dist. 

Det. 

Exp.  dist. 

Det. 

Ph 

7.63  10“3 

7.00  10~3 

6.86  10-3 

7.31  10-3 

7.28  10“3 

7.00  10-3 

7.16  10"3 

init. 

±2.83  10~4 

±4.30  10~4 

±5  33  10~4 

±6.46  10~4 

±2.49  10~4 

±8.72  10~4 

Pi 

8.22  1CT4 

5.76  10"4 

5.92  10~4 

5.13  10"4 

4.98  10-4 

4.28  10-4 

4.31  10-4 

init. 

±1.19  10~5 

±1.78  10-5 

±4.89  10-5 

±5,28  10~5 

±1.10  10~5 

±4.37  10~5 

Ph 

7.57  1(T3 

29.7  10-3 

25.6  10-3 

14.4  10-3 

12.1  10~3 

10.1  io-3 

8.52  10“3 

tot . 

±1.65  10-3 

±1.75  10~3 

±1.33  10~3 

±1.34  10"3 

±0.35  10~3 

±1.12  IO-3 

Pi 

4.15  10~4 

13.9  10-4 

12.4  10-4 

6.80  10-4 

5.90  10-4 

4.90  10“4 

4.47  10"4 

tot. 

±2.32  10~5 

±4.29  10-5 

±6.36  10-5 

±6.57  10~5 

±1.09  10-5 

±4.63  10~5 

Th 

15.24 

22.0 

19.13 

40.8 

33.6 

73.2 

59.9 

(ms) 

±0.32 

±0.48 

±0.95 

±0.88 

±1.23 

±0.92 

T, 

6.822 

12.3 

10.51 

28.0 

23.7 

57.5 

51.8 

(ms) 

±0.07 

±0.16 

±0.24 

±0.22 

±0.41 

±0.16 

two  approximations.  In  the  first  one  we  assume  that  the  reattempt  time  is  zero  and  in  the 
second  one  we  assume  that  it  is  identically  distributed  to  the  OFF  time.  To  calculate  the 
stationary  state  probabilities  with  the  first  approach  the  balanced  global  equations  have 
to  be  solved,  while  in  the  second  approach  they  have  a  simple  product  form  solution.  We 
have  also  calculated  the  mean  blocking  time  assuming  the  first  approach. 

The  numerical  study  shows  that  there  are  not  big  differences  between  the  blocking  prob¬ 
abilities  obtained  with  both  approximations.  The  approximation  of  identical  reattempt 
and  OFF  time  distribution  gives  a  much  simple  way  to  compute  the  blocking  probabilities 
and  can  be  easily  extended  to  more  than  one  htype  source  or  even  more  than  two  types 
of  sources. 

The  results  also  show  that  when  multiplexing  different  type  of  sources,  blocking  prob¬ 
ability  and  blocking  time  depend  on  the  source  parameters.  This  can  be  interpreted  as 
a  lack  of  fairness,  in  the  sense  that  they  will  have  a  different  burst  access.  It  is  actually 
seen  that  an  increase  on  the  bitrate  or  the  mean  burst  duration  of  a  connection  can  re¬ 
sult  in  a  considerably  increase  of  the  blocking  probability  and  blocking  time  of  the  other 
connections. 

Recently  the  ATM  Block  Transfer  Capability  (ABT)  with  two  variants  ABT/DT  and 
ABT/IT  based  on  the  FRP/'DT  and  FRP/IT  respectively  have  been  defined,  ITU  (1995). 
In  this  recommendation  a  Block  level  QoS  commitment  is  defined  in  which  a  reservation 
request  should  be  accepted  by  the  network  within  finite  time  limits  (ABT/DT),  or  with 
a  specified  block  discard  probability  (ABT/IT),  as  long  as  blocks  of  the  connection  are 
conforming  to  the  specified  Sustainable  Cell  Rate.  These  QoS  parameters  can  be  easily 
derived  form  the  blocking  probability  and  blocking  time  parameters  we  have  measured 
(we  note  that  such  ITU  recommendation  is  subsequent  to  the  study  carried  out  in  this 
paper). 
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Figure  3  Influence  of  the  htype  source 
mean  burst  duration  on  the  blocking  prob¬ 
ability 


Figure  4  Influence  of  the  htype  source 
mean  burst  duration  on  the  mean  blocking 
time 
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Figure  5  Influence  of  the  htype  source 
bitrate  within  a  burst  period  on  the  block¬ 
ing  probability 


Figure  6  Influence  of  the  htype  source 
bitrate  within  a  burst  period  on  the  mean 
blocking  time 


Table  2  Source  parameters 


link 

htype 

source 

Itype  source 

capacity 

bitrate 

±on 

lh 

burst- 

bitrate 

±on 

ll 

W/ 

li 

burst- 

num.  of 
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(Mbps) 

(ms) 

(ms) 
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(Mbps) 

(ms) 

(ms) 
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30 
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5 
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APPENDIX  1 

In  this  appendix  we  derive  P(Th  <  x\Bif) ,  j  6  {K\  +  1, . . . ,  TV;}  of  expression  9.  If  an 
htype  and  ltype  sources  are  blocked,  the  ltype  source  will  be  accepted  first  (i.e.  the  htype 
source  does  not  see  a  FIFO  queue),  so  P(Th  <  x\ B2j)  is  the  distribution  of  the  first 
passage  time  from  the  blocking  state  B2j  to  the  non  blocking  state  To  calculate 

this  probability  we  follow  the  method  described  in  Neuts  (1989).  We  are  only  concerned 
about  the  states  (1 ,K\),  (2,  K\  +  1), . . . ,  (2,  Ni )  so  to  simplify  the  notation  we  will  refer 
to  them  as  Ej  ,  j  =  I\\ ,  K\  +  1, . . . ,  Ni.  We  define  the  following  events 

T(j,j  —  r)  =  first  passage  time  from  state  Ej  to  state  Ej-r 
V(j,j  —  r)  =  number  of  transitions  involved  in  T(j,j  —  r ) 

and  their  joint  probability  G(-\x,  k)  =  P{T(j,j  —  r)  <  x,  V(j,j—r )  =  k}.  The  probability 
we  are  looking  for  is  given  by 

CO 

P(Th<x\B2j)  =  Y/G(f~Kl\x,k)  (13) 

k= 1 

We  now  compute  G\'\x,  k).  To  simplify  the  notation,  in  case  of  one  state  transition  we 
will  write  Gj(x,k)  =  G^\x,k).  Let  qf  be  the  transition  rate  from  the  state  Ej  to  the 
state  Ej+ 1;  q~  the  transition  rate  from  the  state  Ej  to  the  state  Ej- 1;  and  q3  the  self  state 
transition  rate  (cfr.  figure  1). 

( Ni  -  j)  o/i 

f  j  /J-l  ,  J  <  I<2 
{  I<2  Pi  ,  j  >  J<2 

qf  +  qj 

We  define  the  one  state  forward  and  backward  transition  probability  Aj(x)  =  P{T(j,j  - 
1)  <  x,V(j,j  -  1)  =  1}  and  A+{x)  =  P{T(j,j  +  1)  <  x,V{j,j  +  1)  =  1}.  We  have 


Aj(x)  =  (1  -  e~q,x)—  ,  j  =  I<i  +  1, ...  ,Ni 

qj 

a+ 

A+(x)  =  (1  -  e~qjX)— ,  j  =  -1 

qj 


(14) 


yielding 


A: ( x ) ,  k  -  1 

0t  k  =  2n 

(•)  *  Gj+i(‘>  2(n  —  0  +  1)  *  Gj(x,2l  — 

i=i 


Gj(x , k)  —  < 


1) ,  k  —  2n  +  1 


(15) 
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and 


G(r){x,k)  = 


E 

ki  -\ - \-ki — k 


Gj( •,  k\)  *  k\)  *  •  •  •  *  Gj-r+i(x ,  kT ) 


(16) 


where  *  is  the  convolution  of  the  distribution  functions  (i.e.  i'i(-)  *  F2(x)  =  ffk>  Fx(x  — 
X)dF2(X)) 

From  15  we  derive  the  following  recursive  equation  for  the  joint  transform 
Gj(s,  z)  =  ££0  f0°°  e~sxzk  dGj(x,  k) 


Gj{s,z)  — 


1  -zAf(s)Gj+i(s,z) 


and 


(17) 


G\  \s, z)  —  y, zk  y,  Gj(s,ki)Gj-i(s,ki)---Gj-r+i(s,kr)  — 

k— 0  k\  •  'f- ky  —  k 

Gj(s,  z )  Gj- 1  (s,  z)  ■  ■  ■  Gj-r. 1-1  (s,  z)  (18) 

where  A~(s),  Aj~(s)  and  Gj(s,k)  are  the  Laplace-Stieltjes  transform  of  the  distribution 
functions  (i.e.  -F(s)  =  f£°  e~sx  dF(x)).  From  14  we  obtain 


Aj(s) 

A+(s) 


gj 

s  +  qj 


s  A  q, 


j  —  A 1  +  1 , . . . ,  Ni 

j  =  Ni  -  1 


Substitution  into  17  yields 


Gj(s,z) 

GNl(s,z) 


_ z_fh _ 

s  +  qf  ~  z  Qj  Gj+ 1  (s,  z) 


j  —  A 1  +  1 , . . . ,  Ni  —  1 


QNi 

Z  - - - 

s  +  QNi 


(19) 


(20) 

(21) 


Substituing  recursively  21  into  20,  and  then  into  18  we  obtain  &j\s,z),  J  —  F\  + 

1  Nt.  Finally,  from  13  we  see  that  Glf~hl  \s,  z)2=1  is  the  Laplace-Stieltjes  transform 
of  P(Th  <  x\B2j).  Inverting  it  and  substituting  into  9  we  obtain  the  distribution  of  the 
blocking  time  Th-  This  is  rather  arduous,  but  from  the  previous  equations  we  can  derive 
a  straightforward  formula  for  the  mean  blocking  time  T),. 

Let  us  define  T-1  Al)  =  E[.r|j3;],  i.e.  the  mean  first  passage  time  from  the  state  Ej  to  the 
state  Ekx.  We  also  define  f}  as  the  mean  first  passage  time  from  the  state  E,  to  the  state 


Ej- 1.  Clearly  T f3'  hl'>  = 


s= 0,  z—  1 


T  —  — 

1  j 


fsGj{s,z) 


5=0,  Z—  1 


.  From  20,  21 
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and  18,  and  since  Gj(.s,  z)e=o,*=i  =  1  we  obtain 
1  +  qf  Tj+i 


Tj  = 


Tty  =  — — 

(lNi 

=  J2  Tk 

k=K  i  +  l 


(22) 

(23) 

(24) 


Substituing  recursively  23  into  22,  and  then  into  24  we  compute  T^J  and  finally  from  9 
we  obtain  the  mean  blocking  time 

N, 

Th=  Y.  T-3~h^  P[B2j)  (25) 

j=Ki+l 


APPENDIX  2 


In  this  appendix  we  derive  the  P(Ti  <  x\BlJ)  of  the  expression  11.  To  calculate  this 
probability  we  consider  the  following  cases: 

1.  BX3  G  {(2,  A'2  +  1), . . . ,  (2,  TV;),  (0,  K2  +  1), . . . ,  (0,  Ni)} 

In  this  case  the  ltype  source  is  blocked  while  the  htype  source  is  silent  or  blocked.  Being 
Bij  =  (i,j)  the  state  resulting  from  the  blocking  transition,  the  source  will  find  j  —  K2  —  1 
ltype  sources  already  blocked  and  it  will  have  to  wait  until  j  —  I\2  ltype  sources  are  served 
(we  say  that  a  source  is  served  when  one  of  the  I\2  bursts  being  transferred  ends).  Let 
.S”;71*  be  the  service  time  of  n  ltype  sources  and  F^f  x)  its  distribution.  Clearly  Fj^(x)  = 
1  -  e~A'2>J‘x  and  F^\x)  =  F^\-)*  .r.  *F^}(x).  Let  FTi\Bij(s)  be  the  Laplace-Stieltjes 
transform  of  P(Ti  <  x\B,j).  We  have 


F 


T,\B, 


,(-S) 


p(j-ki) 

PS, 


(s) 


(K2p,y-K> 

(s  +  K2pif-Ki 


(26) 


2.  B^  G  {(1,/A  +  1),...,(1,A'2)} 

In  this  case  the  ltype  source  is  blocked  while  the  htype  source  is  transferring  a  burst,  but 
the  maximun  of  active  ltype  sources  is  K2.  Being  BK1  —  ( i,j )  the  state  resulting  from  the 
blocking  transition,  the  source  will  find  j  —  K\  -  1  ltype  sources  already  blocked.  So  to  be 
accepted  it  will  have  to  wait  until  the  htype  source  is  served  or  until  j  —  K\  ltype  sources 
are  served.  Let  Sh  be  the  service  time  of  the  htype  source  and  S/  the  service  time  of  n 
ltype  sources.  Let  FSh(x)  and  F<A\x)  be  their  distribution.  Clearly  FSh{x)  —  1  —  e 
and  F^\x)  is  the  same  as  in  the  previous  case,  but  changing  I<2  for  K\.  We  have 

P(T,  <  x\Bij)  —  1  -  P{Sh  >  x)  P{S\j-Ki)  >x)  = 

1  -  (1  -  F5h(x))(l  -  F{f-Kl](x))  =  1  -  (1  -  F(J-Ki\x))e-^x 


(27) 
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the  Laplace-Stieltjes  of  the  previous  equation  is 


FTi\b„(s)  =  s  f  (e-“  -  (1  -  F^Kl\x))e-^)  dx  = 


1 - 

s  +  flh 


1  - 


Kim 


s  +  Hk  +  Kim 


J-Ai 


(28) 


3.  Bij  G  {(l,tf2  +  l),...,(l,W)} 

In  this  case  the  ltype  source  is  blocked  while  the  htype  source  is  transferring  a  burst,  but 
there  are  more  than  I\2  active  ltype  sources.  So  although  the  htype  source  is  served,  the 
ltype  source  can  still  remain  blocked.  Let  Sh  be  the  service  time  of  the  htype  source  and 
Sjn)  the  service  time  of  n  ltype  sources  while  the  htype  is  being  served.  Let  us  consider 
the  density  of  the  blocking  time.  For  convenience  of  notation  we  define  PtJ{Ti  =  x)  = 
P(Ti  =  x\ Bij).  Clearly 


Pii{Ti  =  x)  =  Pij(Ti  =  x,Sh<  Sl1])  + 

''"if  PtJ(Ti  =  X ,  s!k)  <Sh<  s[k+1>)  +  Pt}(T,  =  x,Sh>  SjJ~K2))  (29) 

k=  1 

After  some  computation,  the  Laplace  transform  of  the  previous  expression  yields: 


s  +  t-Lh 


j-A'2-1 

=  E  ^ 

k= 0 


K2IM  ^ 


j-k-Kt 


1  - 


S  +  I\2pi ) 

I<lPl 


{Kim) 


•s  +  g-h  +  K\m 


Ko-K  1 


(s  +  ph  +  Kifil) 

Kim 

s  +  fin  +  Kim 


k+ 1 


+ 


i-K  2 


(30) 


Inversion  of  26,  28  and  30,  and  substitution  into  11  yields  the  distribution  of  the  blocking 
time  7).  Differentiating  these  equations  we  calculate  the  mean  blocking  time 


Ti=Z~ 

VJSi,- 


ds 


jw»)L  P[B‘ 


(31) 
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Abstract 

In  some  B-ISDN  applications  running  on  ATM  networks  (e.g.,  for  audio/video  connec¬ 
tions),  the  occasional  loss  of  a  single  ATM  cell  may  not  affect  the  user’s  perceived  QoS 
requirement.  However,  the  QoS  may  be  degraded  due  to  the  loss  of  a  multiple  (consec¬ 
utive)  ATM  cells.  As  the  event  of  consecutive  cell  loss  is  (typically)  rare,  its  probability 
cannot  be  estimated  efficiently  using  standard  simulation.  In  this  paper  we  propose  a  fast 
simulation  method,  based  on  importance  sampling,  to  efficiently  estimate  the  probability 
of  a  rare  consecutive-cell-loss  event.  As  an  example,  we  consider  a  queueing  model  of 
the  Leaky  Bucket  source  policing  algorithm,  operating  in  a  bursty  traffic  environment. 
We  present  empirical  results  to  demonstrate  the  validity  and  effectiveness  of  our  fast 
simulation  method. 


Keywords 

Rare  event  simulation,  Importance  sampling,  Cell  loss,  ATM  networks,  Quality  of  service 
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1  INTRODUCTION 


In  an  Asynchronous  Transfer  Mode  (ATM)  network,  data  is  transported  in  fixed-size  cells.  A  cell  loss 
may  occur  due  to  a  variety  of  reasons,  such  as  buffer  overffow  in  one  or  more  of  the  network  nodes,  or  as 
a  result  of  traffic  policing  at  the  interface  between  the  user  and  the  network.  In  any  case,  the  impact  of  a 
cell  loss  on  the  quality  of  service  (QoS)  provided  by  a  given  connection  depends  on  the  application  and 
its  resilience  with  respect  to  such  a  cell  loss. 

Due  to  the  bursty  nature  of  traffic  generated  by  broadband  applications  (c.g.,  multimedia  and  video 
conferencing),  cells  are  likely  to  be  lost  in  multiples  (i.e.,  losing  more  than  one  consecutive  arriving  cells). 
For  example,  a  buffer  overflow  at  a  network  node  (even  if  rare)  may  result  in  the  loss  of  many  consecu¬ 
tive  cells.  Recovery  techniques,  such  as  cell  retransmission,  may  be  implemented  at  the  communication 
protocol  level  or  at  the  application  level.  In  some  applications  (such  as  packet  audio/video  communica¬ 
tion),  the  occasional  loss  of  one  or  a  few  cells  may  not  influence  the  QoS.  Also,  extrapolation  and/or 
error  correcting  techniques  can  be  used  to  compensate  for  such  cell  loss.  However,  in  the  absence  of  cell 
retransmission  or  other  adequate  recovery  procedures,  the  loss  of  consecutive  ATM  cells  may  lead  to  a 
remarkable  or  intolerable  degradation  of  QoS.  Therefore,  for  most  applications,  it  is  important  to  keep 
the  occurrence  of  consecutive  cell  loss  as  rare  as  possible.  This  is  particularly  true  for  applications  with 
bursty  traffic,  for  which  the  frequency  of  consecutive  cell  loss  tend  to  be  (relatively)  high.  The  number  of 
consecutive  cell  loss  that  can  be  tolerated  without  affecting  the  QoS  depends  on  the  application  and/or 
the  supporting  recovery  (or  error  correcting)  mechanism,  if  any.  For  a  given  application,  it  is  desirable 
to  keep  the  frequency  of  losing  more  than  a  certain  (tolerable)  number  of  consecutive  cells  below  some 
acceptable  threshold.  This  frequency  may  be  defined  as  the  reciprocal  of  the  steady-state  average  number 
of  cells  between  such  consccutivc-ccll-loss  events.  In  a  simple  queueing  model  with  a  finite  buffer,  this 
frequency  is  closely  related  to  another  measure  of  interest;  namely,  the  probability  of  consecutive  cell  loss, 
say,  in  a  busy  cycle. 

Needless  to  say,  the  development  of  models  for  the  analysis  of  consecutive  cell  loss  is  of  much  interest 
for  the  proper  dimensioning  of  various  buffers  and  other  network  control  parameters.  To  the  best  of 
our  knowledge,  so  far,  there  has  been  no  analytical  results  relating  to  this  relevant  problem.  For  a 
simple  M/M/1  queue  with  a  finite  buffer,  we  derive  analytic  closed  form  expressions  for  the  frequency  of 
consecutive  cell  loss  and  the  probability  of  its  occurrence  in  a  busy  cycle  (sec  Section  2.2  of  this  paper.) 
However,  for  a  GI/GI/l  queue,  the  analysis  is  considerably  more  difficult,  and  a  useful  analytical  or 
algorithmic  solution,  if  at  all  possible,  is  not  yet  available.  For  the  typically  correlated  and  bursty  arrival 
processes,  the  feasibility  of  a  useful  analysis  seems  even  more  remote.  Furthermore,  the  probabilities  of 
interest  are  typically  very  small,  leading  to  numerical  problems. 

In  order  to  avoid  restrictions  necessary  for  analytic  tractability  and/or  numerical  feasibility,  simulation 
is  often  preferred  for  the  evaluation  of  realistic  models.  However,  accurate  estimation  of  the  frequency  of 
rare  events,  such  as  consecutive  cell  loss,  requires  observing  numerous  such  events.  But,  if  the  frequency 
of  consecutive  cell  loss  is  10~9  per  cell,  then  each  consecutive-cell-loss  event  takes  place  approximately 
once  in  109  cells.  Observing  a  sufficiently  large  number  of  consecutive-cell-loss  events  will  take  extremely 
long  simulation  time. 

Importance  sampling  (Hammersley  and  Handscomb  1964)  has  been  used  effectively  to  achieve  signifi¬ 
cant  speed  ups  in  simulations  involving  rare  events,  such  as  failure  in  a  reliable  computer  system  or  cell 
loss  in  an  ATM  communication  network.  See  Nicola  et  al.  (1993)  for  a  review  of  techniques  for  fast  simula¬ 
tion  of  highly  dependable  systems,  and  Hcidclbcrgcr  (1993)  for  a  survey  of  efficient  simulation  methods  to 
estimate  buffer  overflow  probabilities  in  communication  systems.  The  basic  idea  of  importance  sampling 
is  to  simulate  the  system  under  a  different  probability  measure  (i.e.,  with  different  underlying  probability 


416 


Part  Eight  Performance  Modelling  Studies 


distributions),  so  as  to  increase  the  probability  of  typical  sample  paths  involving  the  rare  event  of  interest. 
For  each  sample  path  (observation)  during  the  simulation,  the  measure  being  estimated  is  multiplied  by  a 
correction  factor,  called  the  liktliltood,  ratio,  to  obtain  an  unbiased  estimate  of  the  measure  in  the  original 
system.  Asymptotically  optimal  change  of  measures  (to  use  in  importance  sampling)  have  been  found 
to  estimate  small  probabilities  of  buffer  overflow  in  relatively  simple  queueing  models  (sec,  Parckh  and 
Walrand  (1989),  Sadowsky  (1991),  Chang  et  al.  (1993)  and  others.)  In  this  paper,  we  develop  heuristics, 
which  arc  partly  based  on  these  optimal  change  of  measures,  to  estimate  very  small  consccutivc-ccll-loss 
probabilities  in  simple  GI/GI/l/k  queues  (k  is  the  buffer  capacity,  including  the  server).  We  use  our 
heuristics  to  evaluate  a  queueing  model  of  the  Leaky  Bucket  (LB)  algorithm  (see  Rathgeb  (1991)).  Two 
cell  arrival  processes  arc  considered;  namely,  a  Poisson  process  (mainly  for  validation  and  experimenta¬ 
tion)  and  a  bursty  two-phase  burst/silence  process  (see  Section  4.4).  Empirical  results  demonstrate  the 
effectiveness  of  our  method  to  estimate  very  small  consccutivc-ccll-loss  probabilities.  These  results  also 
show  that  the  simulation  time  needed  to  achieve  a  given  accuracy  increases  (however,  slightly)  with  the 
number  of  consecutive  cell  loss.  This  increase  is  attributed  to  the  inherent  increase  in  variability  of  the 
probability  being  estimated,  rather  than  the  rarity  of  the  event. 

The  rest  of  this  paper  is  organized  as  follows.  In  Section  2,  we  introduce  some  notation  relevant  to  the 
study  of  consecutive  cell  loss  in  simple  queues,  and  we  carry  out  the  analysis  for  the  M/M/l/k  queue. 
In  Section  2.3,  we  briefly  introduce  the  problem  of  rare  event  simulation  and  review  the  basic  idea  of 
importance  sampling.  Change  of  measures  used  in  importance  sampling  to  speed  up  simulations  of  simple 
queues  are  presented  in  Section  3;  both,  a  rare  full-buffer  event  and  a  rare  consecutive-cell-loss  event, 
arc  considered.  Validation  and  experiments  with  our  heuristic  change  of  measure  to  simulate  a  queueing 
model  of  the  LB  algorithm  arc  presented  in  Section  4.  Conclusions  arc  given  in  Section  5. 


2  CONSECUTIVE  CELL  LOSS  IN  SIMPLE  QUEUES 


In  this  section  we  give  brief  preliminaries  and  notation  that  are  needed  for  the  discussion  of  consecutive 
cell  loss  in  simple  queues.  For  an  M / M / 1/ k  queue,  i.c.,  Poisson  cell  arrivals  and  exponential  service 
time  distribution,  the  analysis  is  not  complicated  and  it  is  carried  out  in  this  section.  The  results  of  this 
analysis  are  used  in  Section  4  to  validate  statistical  output  obtained  from  simulation.  For  general  inter¬ 
arrival  and/or  service  time  distributions,  the  analysis  is  considerably  more  difficult  and  is  not  considered 
here. 


2.1  Preliminaries 

Consider  an  GI/GI/l/k  queue  (fc  is  the  buffer  capacity,  including  the  server).  The  probability  density 
function  (pdf)  of  the  inter-arrival  (resp.,  service)  time  is  given  by  /a(0  (resp.,  /s(t).)  Define  the  n- 
consecutive-cell-loss  event  to  be  the  (cell  arrival)  event  at  which  exactly  n  consecutive  cells  are  lost 
during  a  single  full-buffer  (or  overflow)  period.  (Note  that  more  than  n  cells  may  be  lost  during  the  same 
overflow  period.)  We  are  interested  in  the  steady-state  frequency  of  this  event,  i.e.,  the  reciprocal  of  the 
average  number  of  arriving  cells  between  two  subsequent  n-consecutive-cell-loss  events;  this  is  denoted 
by  Tn.  A  closely  related  measure  of  interest  is  the  probability  of  n  or  more  consecutive  cell  losses  in  a 
busy  cycle;  this  is  denoted  by  yn. 

Let  N(t)  be  the  number  of  items  (cells)  in  the  queue  (including  that  in  service)  at  time  t ,  and  denote  by 
tj,j  =  0, 1,  2, ...,  the  consecutive  instants  in  time  at  which  N(t)  jumps  from  0  to  1,  i.e.,  for  all  j  =  0, 1,  2, ..., 
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N(tj  )  —  0  and  N(tf)  >  0.  Define  a  busy  cycle  to  be  the  evolution  of  the  process  N(t)  between  two  such 
consecutive  instants,  say,  tj  and  tj+i.  Note  that  tj,j  =  0, 1,  2, ...,  constitute  renewal  points,  and,  therefore, 
busy  cycles  are  i.i.d.  (independent  and  identically  distributed.)  The  length  of  a  busy  cycle  is  a  r.v.  T;  for 
the  j/’-th  busy  cycle  Tj  =  tj  —  tj_i,  j  =  1,2,....  The  number  of  arrivals  during  a  busy  cycle  is  a  r.v.  N 
which,  because  of  buffer  overflow,  is  not  necessarily  equal  to  the  number  of  departures  in  the  same  busy 
cycle;  for  the  jf-th  busy  cycle  it  is  denoted  by  Nj.  Furthermore,  denote  by  Onj  the  number  of  full-buffer 
periods  in  the  j-th  cycle  during  which  n  or  more  cells  arc  lost.  Onj  is  a  realization  of  the  random  number 
On .  It  follows  that  the  reciprocal  of  the  long-run  (steady-state)  average  number  of  arriving  cells  between 
two  n-consecutive-cell-loss  events,  i.e. ,  the  frequency  J-n,  is  given  by 


=  E(On) 
E(N)  ■ 


(1) 


Usually,  analytic  (or  numerical)  solution  for  E(N)  can  be  determined.  In  particular,  for  an  M/G/l/k 
queue,  it  is  simply  given  by  l/pi,  where  pj  is  the  steady-state  probability  that  the  server  is  idle  (see, 
for  example,  Cooper  (1981)).  The  analysis  for  E(On)  is  considerably  more  complicated,  mainly  because 
the  length  of  a  full-buffer  period  depends  on  the  sample  path  (within  a  busy  cycle)  leading  to  that  full- 
buffer.  For  example,  in  an  M/G/l/k  queue,  full-buffer  periods  in  the  same  busy  cycle  are  independent, 
but  the  first  full-buffer  period  has  a  different  distribution  from  that  of  the  second  and  all  subsequent 
full-buffer  periods.  However,  in  an  M/M/l/k  queue,  all  full-buffer  periods  are  independent  and  have  the 
same  exponential  (service  time)  distribution,  regardless  of  the  sample  path  leading  to  the  full-buffer.  This 
independence  yields  significant  simplifications  leading  to  the  analytical  results  obtained  in  the  following 
section. 


2.2  Analysis  of  the  M/M/l/k  Queue 

Consider  an  M/M/l/k  queue  with  an  arrival  rate  A  and  a  service  rate  p.  A  busy  cycle  is  defined  as  above. 
Define  7Tj,0  <  i  <  k  as  the  probability  that  the  number  in  the  system,  N(t),  moves  from  level  i  to  level 
k  without  hitting  level  0.  In  other  words,  given  that  N(t)  =  i,  7 r;  is  the  probability  that  the  full-buffer 
state  will  be  reached  before  the  end  of  the  busy  cycle.  Let  7  be  the  probability  of  at  least  one  full-buffer 
period  in  a  busy  cycle.  Furthermore,  given  a  full-buffer,  let  4>  be  the  probability  of  yet  another  full-buffer 
period  in  the  same  busy  cycle.  It  follows  that  7  =  7rj  and  <j>  =  Kk-i-  The  probabilities  7T;,0  <  i  <  k  can 
be  determined  from  the  following  equations 


7T;  = 


A4 
A  + 


—  TTi-i  + 

A4 


A 


\  +  p 


Ki+ 1 1 


1  <  *  <  k  -  1, 


(2) 


with  7ru  =  0  and  7^  =  1.  It  follows  that 


7Ti  = 


1  <  i  <  k  -  1. 


(3) 
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Now,  let  pn  be  the  probability  of  n  or  more  arrivals  (i.e.,  n  or  more  consecutive  losses)  in  a  single 
full-buffer  period.  Since  full-buffer  periods  arc  independent  and  having  the  same  exponential  distribution 
with  a  mean  1  //Li,  it  follows  that 

Pn  (4) 

A  ~T  fJ, 

P(On  >  i)  is  the  probability,  in  a  busy  cycle,  of  i  or  more  full-buffer  periods,  during  each  of  which  there 
arc  7 1  or  more  (lost)  arrivals.  The  probability  of  (at  least  one)  n-consccutivc-ccll-loss  in  a  busy  cycle,  7m 
is  given  by 


7n 


P(On  >  1)  =  7  (1  -  Pn)k~l  Pn 

k  =  \ 


7  Pn 

1  -  4>{  1  —  pn) 


(5) 


Also,  define  4>n  to  be  the  probability  of  another  n-consccutivc-ccll-loss  in  the  same  busy  cycle.  Then 


4>n  —  y  '  CP  (1  Pn)  Pn 

fc=l 

<PPn 

i  -  <t>(i-pny 

It  follows  that 


(6) 


P(On  >*)  =  7n#T\  *>1,  (7) 

and 


E{On) 


7n  _  7  Pn 

1  &n  1  (p 


(8) 


Note  that  for  a  sufficiently  high  number  of  consecutive  losses  pn  <£.  1  and  E(On )  w  -yn. 

The  above  analysis  is  not  valid  for  other  queues,  such  as  M/G/l/fc  and  GI /M /l/k.  Appropriate 
analysis  techniques  may  be  developed  for  these  queues,  which  is  a  subject  for  further  investigation  and 
is  not  considered  in  this  paper.  For  these  and  other  GI /GI /l/k  queues,  we  use  simulation  to  estimate 
E(On)  and/or  jn.  However,  because  the  n-consccutivc-ccll-loss  is  typically  a  rare  event,  E(On )  and  7,, 
are  very  small  quantities,  difficult  to  estimate  using  standard  simulation.  In  the  next  section,  we  develop 
fast  simulation  methods,  based  on  importance  sampling,  to  efficiently  estimate  -yn  and/or  E[On ).  These 
methods  can  be  validated  by  comparing  statistical  output  from  simulations  of  the  M/M/l/k  queue  with 
the  above  analytical  results. 
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2.3  IMPORTANCE  SAMPLING 


In  a  GI/GI/l/k  queue,  let  us  consider  the  estimation  of  the  probability  of  reaching  full-buffer  in  a 
busy  cycle,  7  (see  Section  2  for  notation).  This  probability  can  be  expressed  as  7  =  Ej(I(Tfb  <  T )), 
where  Tjb  is  a  r.v.  denoting  the  time  to  reach  a  full  buffer  in  a  busy  cycle,  and  T  is  a  r.v.  denoting  the 
cycle  time  (as  defined  in  Section  2).  /(.)  is  the  indicator  function.  Note  that  Tjb  =  00  for  a  busy  cycle 
in  which  the  buffer  is  never  full.  The  subscript  /  denotes  the  underlying  original  probability  measure 
(i.e.,  the  original  arrival  and  service  processes).  Using  standard  simulation  we  generate  n  independent 
busy  cycles  to  obtain  samples  of  I(Tfb  <  T),  say,  Ib,  I2,  In-  Then  7  =  ,  Ii/n  is  an  unbiased 

estimator  of  7.  The  variance  of  this  estimator  is  given  by  Var;(I(T/b  <  T))/n,  where  Varj(I(Tjb  < 
T))  =  Ef(T2(Tfb  <  T))  -  Ej(I(Tfb  <  T))  =  7  —  72.  From  the  central  limit  theorem  (CLT)  we  have 
~  7)  —*■  JV(0,  Varf(I(Tfb  <  T))).  The  CLT  approximation  can  be  used  to  obtain  a  99%  confidence 
interval  (Cl),  the  half  width  ( HW )  of  which  is  given  by  2.56  \JV arj (I ( Tjb  <  T))/n.  The  relative  error 
(RE)  is  defined  as  the  ratio  HW/ 7  «  2.b&/yjnrj.  Obviously,  for  a  fixed  n,  RE  -)  00  as  7  ->  0.  This 
is  the  problem  when  using  standard  simulation  to  estimate  the  probability  of  a  rare  event,  such  as  7. 
Importance  sampling  can  be  used  to  overcome  this  inherent  problem. 

Now,  let  g  be  another  underlying  probability  measure,  and  w  be  a  sample  path  (c.g.,  a  busy  cycle)  in 
the  set  ft  of  all  possible  sample  paths.  Denote  by  dg(u>)  the  probability  of  the  sample  path  u 1  according 
to  the  new  probability  measure  g.  (Similarly,  df(u)  is  the  probability  of  the  sample  path  u  according  to 
the  original  probability  measure  /.)  Note  that  7  can  be  written  as  follows 

7  =  [  Iu(T/b  <  T)  d/(w)  =  [  IUTfb  <  T )  dg(u) 

Juen  Juen  dg(uj) 

=  I  Iu(Tfb<T)L{w)dg(w)  =  Et(I{Tfb<T)L),  (9) 

V  wG  fl 


where  fw(-)  is  the  indicator  function  evaluated  for  sample  path  uj,  and  L(w)  =  df(uj)/dg(u)  is  the  likelihood 
ratio.  It  is  clear  from  the  above  equation  that  the  only  condition  imposed  on  the  new  probability  measure 
g'  is:  dg(uj)  >  0  whenever  I^(Tjb  <  T)df(u)  >  0.  It  follows  that  we  can  simulate  the  system  using  the 
new  probability  measure  g  to  obtain  n  independent  samples  of  I(Tfb  <  T)L,  say,  RLi,  I2L2, ...,  InLn. 
An  unbiased  estimate  of  7  is  given  by  7  =  X2r=i  hLi/n.  The  variance  of  this  estimator  is  V arg(I(Tfb  < 
T)L)/n  =  (Eg(I(T/b  <  T)L2)  -72)/n.  Notice  that  a  zero  variance  estimator  is  obtained  if  we  choose  the 
new  probability  measure  g  such  that  for  all  weft,  dg(ui)  =  Iu>{Tfb  <  T)df(u)/ 7.  However,  this  is  not 
possible,  since  it  requires  the  knowledge  of  7,  the  quantity  we  arc  trying  to  estimate!  The  main  challenge 
in  importance  sampling  is  to  find  a  robust  and  easily  implementable  new  probability  measure  g  such  that 

Eg(I(Tfb  <  T)L 2)  =  Ef(I(Tfb  <  T)L )  «  Ef(I(Tfb  <  T)).  (10) 

This  means  that  the  variance  of  the  importance  sampling  estimate  is  much  less  than  the  variance  of 
the  standard  simulation  estimate.  In  other  words,  for  the  same  simulation  effort  (e.g.,  the  same  number 
of  busy  cycles  n),  importance  sampling  yields  an  estimate  with  much  smaller  relative  error  than  that 
obtained  using  standard  simulation.  (This  also  implies  a  significant  speed  up  of  simulation  time  to  achieve 
certain  accuracy.)  Notice  from  the  above  equation  that  much  variance  reduction  is  obtained  if  L(w)  = 
df(ui)/dg(uj)  1  whenever  /w(T/(>  <  T)  =  1.  That  is,  g  should  be  chosen  so  as  to  significantly  increase  the 
probability  of  the  rare  event  { Tjb  <  T).  An  “effective”  change  of  probability  measure,  3,  is  one  for  which 
the  relative  error  (RE)  remains  bounded,  also  as  the  probability  of  the  rare  event  tends  to  zero.  This  is 
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a  desirable  property  which  implies  that  the  simulation  effort  (e.g.,  the  number  of  samples  n)  to  achieve  a 
given  relative  error  remains  the  same  as  the  rare  event  becomes  rarer.  In  some  cases,  this  property  may 
be  established  empirically  for  a  given  importance  sampling  technique,  as  will  be  demonstrated  in  our 
experimental  results  of  Section  4. 


3  FAST  SIMULATION  OF  SIMPLE  QUEUES 


Consider  a  simple  queue  with  a  finite  buffer.  The  cell  arrival  “rate”  is  assumed  to  be  sufficiently  smaller 
than  the  service  “rate”,  so  that  reaching  a  full-buffer  (or  buffer  overflow)  is  a  rare  event.  Efficient  simula¬ 
tion  involving  a  rare  full-buffer  event  has  been  considered  by  many  (see,  for  example,  Parekh  and  Walrand 
(1989)  and  Sadowsky  (1991).)  Another  rare  event  of  interest  is  the  n- consecutive- cell-loss  event,  which 
may  occur  only  after  the  full-buffer  is  reached.  In  this  section  we  consider  these  two  related  rare  events,  and 
develop  an  importance  sampling  heuristic  to  speed  up  simulations  involving  a  rare  consecutive-cell-loss 
event. 


3.1  Rare  Full-Buffer  Event 

In  a  GI /GI /1/k  queue,  lot  us  again  consider  the  estimation  of  the  probability  of  reaching  full-buffer  in 
a  busy  cycle,  7.  As  in  Section  2.3,  this  probability  can  be  expressed  as  7  =  <  T)),  where  the 

expectation  is  taken  with  respect  to  the  original  probability  measure  /.  Since  {T/f,  <  T}  is  a  rare  event 
(i.c.,  7  ss  0),  using  standard  simulation  is  very  inefficient,  as  it  yields  0  for  the  indicator  function  on 
almost  all  busy  cycles.  Using  importance  sampling,  we  have  7  -  E/(I)  =  Eg(IL ),  where  /  and  g  are  the 
original  and  the  new  probability  measures,  respectively,  and  L  is  the  likelihood  ratio.  Denote  by  dg(u) 
the  probability  of  a  sample  path  ui  according  to  the  new  probability  measure  g.  (Similarly,  df(uj)  is  the 
probability  of  a  sample  path  w  according  to  the  original  probability  measure  /.)  Then  L{ u)  =  df(u>)/dg(u) 
is  the  likelihood  ratio  associated  with  a  sample  path  u>\  it  can  be  computed  easily  during  the  simulation. 
For  example,  let  t\  •  (resp.,  t'Sj),  i  =  1, 2, ...,  Nj,  be  the  cell  arrival  (resp.,  departure)  instants  in  the 
j-th  busy  cycle.  Furthermore,  let  g'Aj(t)  (resp.,  g'Sj(t))  be  the  new  i-th  inter-arrival  (resp.,  service)  time 
density  used  to  simulate  the  system  with  importance  sampling.  The  likelihood  ratio,  Lj ,  associated  with 
the  j-th  busy  cycle,  takes  the  form 


a = n 


■Mffi  -  t\}j) 


x  hit's,,  A,/) 

9s,j(t's,j  ~  tsj ) 


(11) 


Note  that  t -+1  =  =  tlA  -+1  is  the  instant  at  which  the  j-th  busy  cycle  ends  and  the  j  +  1-th  busy 

cycle  begins.  Thus,  Lj  can  be  computed  recursively  at  arrival  and  departure  events  during  the  simulation. 

Now.  let  b  be  the  number  of  independent  “biased”  (using  importance  sampling)  busy  cycles  used  to 
obtain  estimates  for  the  mean  and  the  variance  of  the  r.v.  IL.  These  estimates  are  given  by 


b  b 

PI  =  E  W&>  A2  =  E  (LA  -  hi)V(b  - 1). 

j=i  j= 1 
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From  the  central  limit  theorem,  for  large  b,  the  estimate  /i/  is  approximately  normally  distributed.  It 
follows  that  the  relative  half-width  (in  percentage)  of  the  99%  confidence  interval  for  the  above  estimator 
is  given  by  2.56(0///}/)  x  100. 

In  the  following  we  consider  the  optimal  change  of  measure  (importance  sampling  distribution)  to 
efficiently  estimate  7.  Let  FA(6)  =  ft°^u  c0t  fA(t)dt  be  the  moment  generating  function  of  the  inter-arrival 
times.  Define  f^(t)  =  c0t  f A{t) / 'FA(8);  this  is  another  pdf  obtained  by  exponentially  tilting  (twisting)  the 
pdf  fA[t)  at  a  parameter  8.  Similarly,  Fs(8)  =  fef0  eotfs(t)dt  is  the  moment  generating  function  of  the 
service  times,  and  /|(f)  =  e0t  fs{t) / Fs(8)  is  the  corresponding  exponentially  tilted  pdf. 

Using  heuristic  arguments  based  on  the  theory  of  large  deviations  (Bucklcw  1990),  Parckh  and  Walrand 
(1989)  proposed  an  importance  sampling  distribution  to  efficiently  estimate  the  probability  of  buffer 
overflow  in  a  GI /GI /l/k  queue.  I11  Sadowsky  (1991),  this  distribution  was  proved  to  be  the  unique 
asymptotically  (as  k  — »  00)  optimal  change  of  measure.  Let  8 *  be  the  solution  of  the  equation 

FA(-0*)  Fs(0*)  =  1.  (12) 

Then  the  optimal  change  of  measure  is  obtained  by  simulating  the  GI /GI /l/k  queue  with  the  exponen¬ 
tially  tilted  densities  gA (t)  =  ff°  ( t )  and  </s(t)  =  /f  ( t ).  Importance  sampling  is  “turned  on”  at  the 
start  of  each  busy  cycle,  and  is  “turned  off”  at  the  occurrence  of  the  rare  event.  The  moment  generating 
functions  for  the  new  (optimal)  inter-arrival  and  service  times  are  given  by 


Ga(6) 


Fa(0-8') 
Fa(-8')  ' 


Gs(8)  = 


Fs{8  +  8 *) 
Fs{8*) 


(13) 


Consider  the  M /M /l/k  queue  with  its  arrival  rate  A  much  smaller  than  its  service  rate  /t  (i.c.,  A  //), 
so  that  a  full  buffer  is  a  rare  event.  FA(-6)  =  A/(A  +  6)  and  Fs{8)  =  fi/(fi  —  8 ),  for  8  <  /i.  Solving 
the  equation  FA(—8*)  Fs(9*)  =  1  for  8 *,  we  get  8*  =  //  —  A.  It  follows  that  GA(8)  =  /i/(/i  —  8)  and 
Gs(0)  =  A/(A  —  8),  i.c.,  optimally,  the  M /M /l/k  queue  is  simulated  with  arrival  rate  //  and  service  rate 
A.  This  change  of  measure  accelerates  the  arrival  process  relative  to  the  service  process,  thus  increasing 
the  probability  of  a  full  buffer  in  the  simulated  system. 

In  the  next  section,  we  use  the  optimal  importance  sampling  distribution  (as  outlined  above)  in  a 
heuristic  to  estimate  very  small  consecutive-cell- loss  probabilities. 


3.2  Rare  Consecutive-Cell-Loss  Event 


In  this  section  we  consider  the  estimation  of  the  probability  of  losing  n  or  more  consecutive  cells  in  a 
busy  cycle,  ~/Tl  (see  Section  2  for  notation).  This  probability  can  be  expressed  as  7„  =  Ef(I(Tn  <  T)), 
where  the  expectation  is  taken  with  respect  to  the  original  probability  measure  /.  Tn  is  a  r.v.  denoting 
the  time  to  the  first  n-consccutivc-ccll-loss  event  in  a  busy  cycle,  and  T  is  a  r.v.  denoting  the  cycle  time 
(also  defined  in  Section  2).  Note  that  T„  =  00  for  a  busy  cycle  in  which  there  is  no  n  consecutive  cell  loss. 
Here  too,  since  {Tn  <  T}  is  a  rare  event,  using  standard  simulation  is  very  inefficient.  In  fact,  the  event 
{Tn  <  T}  must  be  at  least  as  rare  as  the  event  {Tj),  <  T},  since  the  former  may  or  may  not  occur  only 
after  the  latter  has  occurred.  Using  importance  sampling,  we  have  7„  =  Ef(I)  =  Eg(IL),  where  /  and 
g  arc  the  original  and  the  new  probability  measures,  respectively,  and  L  is  the  likelihood  ratio.  Based 
on  b  independent  “biased”  (using  importance  sampling)  busy  cycles,  estimates  of  the  mean  /}/  and  the 
variance  df  (and  hence  confidence  intervals)  are  obtained  as  described  in  Section  3.1. 
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To  the  best  of  our  knowledge,  the  problem  of  estimating  the  probability  of  a  rare  consecutive-cell-loss 
event  ('in)  using  importance  sampling  has  not  been  considered  before.  Note  that  this  rare  event  can  only 
occur  during  a  full-buffer  period,  i.e.,  after  the  occurrence  of  a  typically  rare  full-buffer  event.  Therefore, 
it  seems  intuitive  to  use  two  “biasing1'  (importance  sampling)  schemes,  one  to  reach  a  full-buffer,  and 
another,  if  necessary,  to  lose  n  consecutive  cells  during  that  full-buffer  period.  The  main  idea  of  our 
importance  sampling  heuristic  is  to  use  the  optimal  change  of  measure  to  reach  the  full-buffer  state  (as 
described  in  Section  3.1.)  Once  (and  every  time,  until  the  consecutive  loss  of  n  cells)  the  full-buffer  state 
is  reached,  additional  “biasing”  (e.g.,  by  increasing  the  arrival  “rate”)  is  applied  (if  necessary)  to  increase 
the  probability  of  n  or  more  arrivals  (losses)  during  the  full-buffer  period.  "Biasing”  is  turned  off  as  soon 
as  the  rare  event  of  interest  occurs,  i.e.,  n  arrivals  during  a  full-buffer  period.  Otherwise,  biasing  is 
continued  according  to  the  optimal  change  of  measure  (of  Section  3.1)  until  the  next  full-buffer  period  or 
the  end  of  the  busy  cycle.  The  implementation  details  of  “biasing”  during  full-buffer  periods  may  differ 
depending  on  the  particular  arrival  and  service  processes  being  considered.  These  details  will  be  discussed 
for  each  of  the  models  used  in  our  experiments  of  Section  4.  Empirical  results  from  these  experiments 
demonstrates  the  effectiveness  of  the  above  importance  sampling  heuristic  to  estimate  7n.  The  same 
heuristic  can  also  be  used  to  estimate  Tn,  the  frequency  of  the  n-consecutive-cell-loss  event.  In  either 
case,  several  orders  of  magnitude  “speed  ups”  over  standard  simulation  can  be  obtained. 

It  is  important  to  mention  that,  in  general,  the  simulation  effort  (with  importance  sampling)  slowly 
increases  with  the  number  of  consecutive  cell  loss  of  interest,  i.e.,  the  importance  sampling  scheme  is  not 
asymptotically  (as  n  oo)  efficient.  (This  can,  perhaps,  be  seen  from  the  experimental  results  for  the 
M/D/l/k  queue  in  Section  4.3.)  However,  this  is  not  due  to  the  increased  rarity  of  the  n-consecutive- 
ccll-loss  event,  but  due  to  increase  in  the  inherent  variance  of  the  probability  of  n  or  more  arrivals  during 
a  full-buffer  period.  Let  V  be  a  r.v.  denoting  the  length  of  a  full-buffer  period,  then  for  Poisson  arrivals 
with  a  rate  A,  this  probability  is  given  by  Pn(V )  =  e~xv  YiZn  (4P)1/®--  Clearly,  the  variance  of  Pn(V) 
increases  with  the  variance  of  V  and  is  amplified  for  high  values  of  n.  It  is  this  inherent  increase  in 
variability  which  cannot  be  reduced  by  importance  sampling.  In  fact,  for  an  M/M/l/k  queue,  the  full- 
buffer  periods,  V,  arc  independent  and  exponentially  distributed  with  a  mean  1/u-  In  this  case,  samples  of 
Pn( V)  observed  during  simulation  can  be  replaced  by  their  (deterministic)  mean  pn  =  (y4_)".  This  way, 
the  variability  of  Pn(V)  docs  not  affect  the  simulation  results.  Indeed,  for  an  M/M/l/k  queue,  this  special 
implementation  of  our  heuristic  is  asymptotically  efficient  (as  n  — »  oo),  which  is  clearly  demonstrated  by 
the  empirical  results  in  Section  4.1. 


4  EXPERIMENTAL  RESULTS 

In  this  section  we  use  fast  simulation  methods  discussed  in  Sections  3.1  and  3.2  to  evaluate  a  model 
of  the  Leaky  Bucket  (LB)  algorithm.  For  validation  purposes,  the  simulation  of  an  M/M/l/k  queue  is 
considered  in  Section  4.1.  The  operation  of  the  LB  algorithm  and  its  model  are  described  in  Section  4.2. 
The  evaluation  of  this  model  is  considered  in  Sections  4.3  and  4.4,  for  Poisson  and  two-phase  burst/silcncc 
(TPBS)  cell  arrival  processes,  respectively.  The  empirical  results  displayed  here  include  estimates  of  jn 
(i.e.,  the  probability  of  losing  n  or  more  consecutive  cells  in  a  busy  cycle),  E(On)  (i.e.,  the  expected 
number  of  n-consecutive-cell-loss  events  in  a  busy  cycle)  and  Pn  (i.e.,  the  steady-state  frequency  of  the 
n-consecutive-cell-loss  event . ) 
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4.1  Simulation  of  the  M/M/l/k  Queue 

In  this  section  we  consider  the  efficient  simulation  of  an  M/M/l/k  queue  to  estimate  the  probability 
of  consecutive  cell  loss  in  a  busy  cycle.  For  this  model,  analytical  results  in  Section  2.2  can  be  used  to 
validate  statistical  output  from  simulation.  As  outlined  in  Section  3.2  our  importance  sampling  heuristic 
makes  use  of  two  different  “biasing”  schemes.  The  first  is  optimal  “biasing”  (as  described  in  Section  3.1) 
to  reach  the  full-buffer  state  (i.c.,  the  M/M/l/k  queue  is  simulated  with  arrival  rate  p  and  service  rate 
A.)  The  second  is  “biasing”  during  full-buffer  periods,  which  in  the  special  case  of  an  M/M/l/k  queue 
can  be  implemented  as  follows.  As  argued  in  Section  3.2,  the  probability  of  n  or  more  arrivals  (losses) 
during  a  full-buffer  period  is  given  by  pn  =  (  ) n .  which  is  typically  very  small  in  the  original  queue. 

In  the  simulated  queue,  we  increase  this  probability  to  ps  (a  constant  sufficiently  higher  than  pn\  for 
example,  ps  =  0.5).  With  probability  pa,  the  full-buffer  period  is  considered  to  be  a  “successful”  overload 
period  (i.c.,  having  n  or  more  arrivals).  Let  U  be  a  uniform  random  variable  (0  <  U  <  1).  Every  time 
(until  the  consecutive  loss  of  n  cells)  the  full-buffer  state  is  reached,  we  take  a  sample  u  of  U.  If  u  <  pa, 
then  the  n-consecutive-cell-loss  event  is  considered  to  have  occurred,  and  “biasing”  is  turned  off  until  the 
end  of  the  current  busy  cycle.  In  this  case,  the  likelihood  ratio  is  updated  by  the  multiplication  factor 
Ptl/p*.  (Note  that  in  this  implementation,  a  sample  of  the  full-buffer  period  need  not  be  generated,  and 
the  simulation  is  continued,  from  the  instant  of  reaching  the  full-buffer  state,  as  if  a  departure  event  has 
just  occurred  leaving  the  queue  with  k  -  1  cells.)  Otherwise,  if  u  >  pa,  then  the  n-consccutivc-ccll-loss 
event  is  considered  to  have  not  occurred,  and  “biasing”  is  continued  as  described  in  Section  3.1  until  the 
next  full-buffer  period  or  the  end  of  the  current  busy  cycle.  In  this  case,  the  likelihood  ratio  is  updated 
by  the  multiplication  factor  (1  —pn)/{  1  -pa). 

Now  let  us  consider  the  M/M/l/k  queue  with  A  =  0.8  cells  per  unit  of  time,  p  =  1.0  cells  per  unit 
of  time  and  k  =  25.  In  Table  1,  for  increasing  n,  we  give  fast  simulation  estimates  of  the  cycle-based 
quantities;  namely,  the  n-consecutive-cell-loss  probability  (7„)  and  the  expected  number  of  n-consecutive- 
ccll-loss  events  E(On).  Numerical  results  from  analysis  arc  also  displayed.  Consistent  with  our  remark 
in  Section  2.2,  note  that  E(0„)  «  -yn  for  values  of  n  >  8.  Also,  Note  that  the  frequency  Tn  can  be 
determined  by  E(On)/E(N)  =  Pj  E(On),  where  Pj  =  1  —  4. 

.  Using  different  arrival  and  service  rates,  experiments  indicate  that  for  high  n,  the  lowest  relative 
error  can  be  obtained  by  setting  pa  (approximately)  to  1  —  4.  Therefore,  the  “biasing”  probability  pa 
is  heuristically  set  to  rnax(p'n,  1  —  4),  where  p'n  is  the  (new)  probability  of  n  or  more  arrivals  during  a 
full-buffer  period  in  the  simulated  system  (i.c.,  with  the  optimal  change  of  measure  as  given  in  Section 
3.1.)  For  the  simulated  M/M/l/k  queue,  it  follows  that  p'n  =  (^pjj)”.  25600  “biased”  busy  cycles  were 
■simulated  to  get  the  estimates  and  their  relative  error  (i.c.,  the  relative  half-width  of  the  99%  confidence 
interval)  in  percentage.  Note  that  fast  simulation  results  are  in  good  agreement  with  the  numerical  results 
from  analysis.  Also,  the  relative  error  docs  not  increase  for  larger  values  of  n\  this  verifies  the  asymptotic 
optimality  of  the  particular  implementation  of  our  proposed  importance  sampling  method  when  applied 
to  the  M/M/l/k  queue. 


4.2  The  Leaky  Bucket  (LB)  Algorithm 

An  ATM  connection  is  established  with  an  admission  contract  which  specifies  the  traffic  characteristics 
of  the  source  and  the  quality  of  service  (QoS)  to  be  guaranteed  by  the  network.  In  order  for  the  network 
to  ensure  that  the  admission  contract  is  not  violated,  the  usage  parameter  control  (UPC)  procedure  is 
invoked  to  monitor  the  actual  traffic  and  to  police  the  excess  traffic  violating  the  contract.  The  Leaky 
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Bucket  (LB)  algorithm  is  a  popular  UPC  procedure  and  can  easily  be  implemented  with  counters  (see 
Turner  (1986).)  Each  time  a  cell  arrives,  the  counter  is  incremented  by  one.  As  long  as  the  counter  has 
a  positive  value,  it  is  decremented  at  fixed  intervals,  d.  When  the  cell  arrival  “rate”  exceeds  the  periodic 
decrement  “rate,”  the  counter  value  will  increase.  If  the  counter  reaches  a  pre-specified  limit,  say,  k, 
then  the  source  is  considered  to  have  exceeded  its  admission  contract,  and  subsequent  cells  arc  discarded 
(or  marked  for  policing)  until  the  counter  value  falls  below  the  limit  again.  The  operation  of  this  LB 
algorithm  can  be  modeled  as  a  GI/D/l/k  queue,  in  which  the  service  time  is  deterministic  and  identical 
to  the  decrement  interval,  d.  An  arriving  cell  is  lost  if  it  finds  a  full  buffer. 

For  a  two-phase  burst/silcncc  source  model  (sec  Section  4.4),  the  stationary  cell  loss  probability  can 
be  obtained  by  a  numerical  method  whose  complexity  grows  in  proportion  to  the  value  of  k  (Rathgeb 
1991.)  No  analytical  or  numerical  method  is  available  yet  to  obtain  the  probability  of  consecutive  cell  loss 
in  a  GI /D/X/k  queue.  In  order  to  avoid  restrictions  necessary  for  analytic  tractability  and/or  numerical 
feasibility,  simulation  is  often  preferred  for  the  evaluation  of  realistic  models  of  the  LB  algorithm.  However, 
standard  simulation  is  not  efficient  because  consecutive  cell  loss  is  a  rare  event.  Accurate  and  efficient 
estimation  of  very  small  probabilities,  such  as  7,  using  importance  sampling  has  been  considered  in  Nicola 
ct  al.  (1994).  In  the  next  two  sections,  we  use  the  importance  sampling  heuristic  proposed  in  Section  3 
to  efficiently  estimate  7„,  E(On)  and  Tn  in  a  model  of  the  LB  algorithm  with  (non-bursty)  Poisson  and 
(bursty)  TPBS  cell  arrival  processes. 

4.3  Poisson  Cell  Arrival  Process 

In  this  section  we  use  importance  sampling  to  efficiently  estimate  the  probability  of  consecutive  cell  loss  in 
a  busy  cycle  of  an  M/D/l/k  queueing  model  of  the  LB  algorithm  (i.c.,  for  a  Poisson  cell  arrival  process). 
The  arrival  rate  is  A  and  the  service  time  is  a  constant  d.  As  outlined  in  Section  3.1,  the  optimal  change  of 
measure  to  reach  the  full-buffer  state  can  be  obtained  by  solving  Equation  (12)  for  9".  The  corresponding 
inter-arrival  and  service  time  densities  can  now  be  determined  from  their  generating  functions  as  given  in 
Equation  (13).  It  follows  that  the  optimal  service  times  are  also  deterministic  and  identical  to  the  original 
(i.c.,  no  change  in  the  service  process.)  However,  the  arrival  process  docs  change,  so  as  to  increase  the 
probability  of  the  rare  full-buffer  event.  We  note  that  full-buffer  periods  (i.e. ,  the  actual  remaining  service 
time  upon  reaching  the  full-buffer  state)  in  the  same  busy  cycle  arc  neither  independent  nor  identically 
distributed.  Therefore,  in  this  implementation,  these  full-buffer  periods  must  be  simulated  (uidike  the 
implementation  for  the  M/M/l/k  queue).  The  probability  of  n  or  more  arrivals  (losses)  during  a  full- 
buffer  period  depends  on  the  remaining  service  time  (r  <  d)  and  is  given  by  Pn(r)  =  c~Xr  YlZn  r)'/i\ . 
This  probability  is  typically  very  small  in  the  original  system,  and,  therefore,  “biasing”  is  necessary  to 
increase  the  probability  of  “success”  (i.c.,  n  or  more  arrivals)  during  the  full-buffer  period.  In  the  simulated 
queue,  we  increase  this  probability  to  p„  (a  constant  sufficiently  higher  than  P„(r);  for  example,  p3  =  0.5). 
Every  time  (until  the  consecutive  loss  of  n  cells)  the  full-buffer  state  is  reached,  we  take  a  sample  u  of 
a  uniform  random  variable  U  (defined  in  Section  4.1).  If  u  <  ps,  then  the  n-consccutivc-ccll-loss  event 
is  considered  to  have  occurred,  and  “biasing”  is  turned  off  until  the  end  of  the  current  busy  cycle.  In 
this  case,  at  the  end  of  the  full-buffer  period,  the  likelihood  ratio  is  updated  by  the  multiplication  factor 
Pn(r)/Ps •  Otherwise,  if  u  >  p„,  then  the  n-consecutive-cell-loss  event  is  considered  to  have  not  occurred, 
and  “biasing”  is  continued  immediately  after  the  full-buffer  period  (as  described  in  Section  3.1)  and  until 
the  next  full-buffer  period  or  the  end  of  the  current  busy  cycle.  In  this  case,  at  the  end  of  the  full-buffer 
period,  the  likelihood  ratio  is  updated  by  the  multiplication  factor  (1  —  P„(r))/(1  —  p„). 

Note  that  when  the  full-buffer  period  r  is  very  small  (i.e.,  r  «  d),  “biasing”  may  yield  non-typical 
sample  paths,  resulting  in  extremely  small  values  for  the  likelihood  ratio  and  leading  to  unstable  estimates. 
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To  overcome  this  problem,  the  above  heuristic  is  modified  as  follows.  Upon  reaching  the  full-buffer  state, 
Pn{r)  is  determined,  and  “biasing”  during  the  full-buffer  period  (as  outlined  above)  is  activated  only 
if,  say,  Pn(r)/Pn(d)  >  4  x  10  A  In  this  way,  “biasing”  is  activated  only  when  a  full-buffer  period  is 
sufficiently  large  to  yield  a  rare  (but  typical)  sample  path.  As  long  as  the  consecutive-cell- loss  event  did 
not  occur,  “biasing”  to  reach  the  next  full-buffer  period  is  resumed  as  outlined  above.  The  following 
example  shows  that  the  above  heuristic  with  this  modification  is  quite  robust  and  effective. 

Now  let  us  consider  the  model  of  the  LB  algorithm  with  a  Poisson  cell  arrival  process  at  rate  A  =  0.8 
cells  per  unit  of  time.  The  new  (optimal)  arrival  process  to  reach  the  full-buffer  state  is  also  Poisson, 
however,  at^an  increased  rate  A*  =  A  +  6\  where  (from  Equation  (12))  0*  is  the  noil-trivial  solution  of 
^  .  The  (deterministic)  service  time  is  set  to  d  =  1  time  unit,  k  =  10,  and  we  vary  the  number 

of  consecutive  cell  loss,  n.  In  Table  2,  we  list  fast  simulation  estimates  of  7n  and  E(On)  as  well  as  their 
relative  error  (i.c.,  the  relative  half-width  of  the  99%  confidence  interval)  in  percentage.  25600  “biased” 
busy  cycles  were  used  to  get  these  estimates.  Using  different  arrival  rates  and/or  service  times,  the  best 
relative  error  (for  high  values  of  n)  is  obtained  by  setting  p,  (approximately)  to  1  -  A d.  Therefore,  the 
“biasing”  probability  pa  is  hcuristically  set  to  max{P'n{r),  1  -  Ad),  where  P/(r)  is  the  (new)  probability  of 
n  or  more  arrivals  during  the  full-buffer  period  r  in  the  simulated  system  (i.e.,  with  the  increased  optimal 
arrival  rate  A*.)  For  the  simulated  M/D/l/k  queue,  it  follows  that  P/(r)  =  e~x'r  YuLn  (AV)*/!!.  Note 
that  if  “biasing"  is  not  activated  in  a  full-buffer  period  because  r  <  d,  thcnp.„  =  Pn(r)'  and  the  likelihood 
ratio  is  not  updated  at  the  end  of  the  full-buffer  period.  Using  the  same  effort  (in  CPU  time),  standard 
simulation  yields  meaningful  results  for  only  two  entries  with  relatively  high  probabilities.  As  can  be  seen, 
the  relative  error  of  the  fast  simulation  estimates  slowly  increases  with  n,  which  is  an  indication  that  the 
importance  sampling  heuristic  is  not  asymptotically  efficient  with  respect  to  n.  As  explained  in  Section 
3.2,  this  is  due  to  the  increased  variability  of  P„(U)  for  higher  n,  where  V  is  a  r.v.  denoting  the  length 
of  a  full-buffer  period.  Note  that  E[On)  ks  7n  for  values  of  n  >  4,  which  validates  our  remark  in  Section 
2.2  for  queues  other  than  the  M/M/l/k. 

4.4  Bursty  Cell  Arrival  Process 

In  this  section  we  consider  the  evaluation  of  the  LB  algorithm  for  a  more  realistic  two-phase  burst/silence 
cell  arrival  process  (sec  Rathgcb  (1991)),  which  wc  will  refer  to  as  TPBS  process.  This  arrival  process 
has  been  used  to  model  bursty  sources,  such  as  packetized  voice  (see  Heffes  and  Lucaritoni  (1986))  and 
interactive  data  services,  and,  therefore,  it  is  often  used  to  compare  various  policing  mechanisms.  The 
number  of  cells  per  burst  is  geometrically  distributed  with  a  parameter  o,  and  the  inter-cell  time  during 
a  burst  is  deterministic  given  by  r.  Therefore,  transitions  from  burst  to  silence  occur  with  a  probability 
a,  only  at  multiples  of  t.  The  duration  of  the  silence  phase  is  exponentially  distributed  with  a  mean  P~l . 
The  peak  cell  arrival  “rate”  is  1/r,  and  the  average  cell  arrival  “rate”  A  =  (r  +  a//?)-1.  Note  that  we 
can  increase  the  burstiness  of  the  cell  arrival  process  by  increasing  the  average  burst  length  (i.e.,  smaller 
a)  while  keeping  the  average  cell  “rate”  the  same  (i.c.,  constant  a/p.)  The  pdf  of  the  TPBS  inter-arrival 
time  and  its  moment  generating  function  are  given  by 


i  °’ 
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qA(t)  =  f~°‘  ( t )  is  the  corresponding  exponentially  tilted  pdf  (with  a  tilting  parameter  0  );  its  moment 
generating  function  is  given  by  GA(0)  =  FA{8  -  6*)/ FA(-0‘).  It  can  be  shown  that  the  tilted  pdf,  gA(t), 
is  also  a  TPDS  process  with  the  same  deterministic  burst  inter-cell  time  r,  and  with  its  parameters, 
P*  =  p  +  and  =  aP/(ft  +  (1  -  a)#*).  The  tilted  pdf,  gA{t),  is  used  as  the  (new)  inter-arrival  time 
density  for  simulation  with  importance  sampling  to  reach  the  full-buffer  state. 


For  a  TPBS  cell  arrival  process,  the  LB  algorithm  can  be  modelled  as  TPBS/D/l/k  queue.  Since  the 
full-buffer  and  the  consecutive  cell  loss  are  typically  rare  events,  importance  sampling  is  used  to  efficiently 
simulate  this  system.  At  the  beginning  of  each  busy  cycle,  and  after  each  full-buffer  period  (as  long  as  the 
rare  consecutive-cell- loss  event  has  not  occurred),  “biasing”  to  reach  the  next  full-buffer  period  is  affected 
as  described  in  Section  3.1.  The  new  “biased”  (TPBS)  cell  arrival  process  is  determined  by  «*,  P*  and  r 
as  given  above.  The  service  time  is  deterministic  (d),  and,  therefore,  remains  unchanged  in  the  simulated 
system.  As  soon  as  the  full-buffer  state  is  reached,  further  “biasing”  during  the  full-buffer  period  may 
be  necessary  to  accelerate  the  n-consecutive-cell-loss  event.  Since  the  inter-cell  time  during  a  burst  (r)  is 
deterministic,  the  number  of  cells  that  may  be  lost  during  a  full-buffer  period  of  length  r  cannot  exceed  a 
maximum  given  by  nmax  =  |r/r|.  At  the  beginning  of  a  full-buffer  period  of  length  r,  if  n  <  nmox,  then 
“biasing”  is  done  by  setting  the  new  P  to  P* .  If  a"  is  not  sufficient  to  increase  the  probability  of  n  or  more 
remaining  cells  in  the  current  burst  to  a  high  value,  ps  (for  example,  p„  =  0.5.) ,  then  the  new  a  is  set  to 
as  as  determined  from  (1  -  as)n  =  ps  (i.e.,  as  =  1  -  eln<P*)''n.)  In  other  words,  until  the  consecutive  loss 
of  n  cells,  we  use  the  optimal  “biasing”  to  reach  the  full-buffer  state  (i.e.,  the  new  a  is  set  to  a*  and  the 
new  P  is  set  to  P* .)  In  addition,  depending  on  n  and  r,  more  (stronger)  “biasing”  during  the  full-buffer 
period  may  be  necessary  (i.e.,  if  n  <  nmax ,  then  the  new  a  is  set  to  min(a*,a,).)  The  effectiveness  of 
this  heuristic  is  demonstrated  in  one  example.  In  another  example,  we  use  the  heuristic  to  experiment 
with  the  burstiness  of  the  cell  arrival  process. 


In  the  first  experiment,  we  consider  a  TPBS  cell  arrival  process  with  a  =  0.2,  P  -  5.0  x  10  4  and 
T  =  1.  The  (deterministic)  service  time,  d,  is  set  to  100  time  units,  and  k  is  set  to  30.  In  Table  3,  the 
number  of  consecutive  cell  loss,  n,  is  varied,  and  we  give  fast  simulation  estimates  of  7n  and  Fn ,  with 
their  percentage  relative  error  (i.e.,  the  relative  half-width  of  the  99%  confidence  interval.)  25600  "biased  ’ 
busy  cycles  were  used  to  get  these  estimates.  For  all  n,  the  “biasing’  probability,  p3,  is  set  to  0.5.  It  is 
not  directly  seen  from  the  table,  however,  it  is  interesting  to  point  out  that,  for  smaller  values  of  n, 
stronger  “biasing”  during  full-buffer  periods  is  not  necessary  (i.e.,  the  new  a  is  set  to  a*.)  For  relatively 
high  consecutive-cell- loss  probabilities,  it  was  possible  to  compare  with  results  from  standard  simulation 
using  the  same  effort  (in  CPU  time.)  Note  that  the  relative  error  of  the  fast  simulation  estimates  slowly 
increases  with  n,  i.e.,  the  importance  sampling  heuristic  is  not  asymptotically  efficient  with  respect  to  n. 
A  similar  observation  was  made  in  the  experiment  for  the  M/D/l/k  queue  in  Section  4.3. 


In  the  second  experiment,  we  consider  a  TPBS  arrival  process,  in  which  we  increase  the  burstiness, 
while  fixing  the  average  cell  arrival  “rate.”  As  described  earlier  in  this  section,  this  can  be  achieved  by 
decreasing  a  and  P,  while  fixing  a/p.  We  set  r  =  1  and  A  =  1/50.  It  follows  that  ot/P  is  fixed  at  49. 
The  (deterministic)  service  time,  d,  is  set  to  25  time  units,  and  k  is  set  to  100.  For  a  fixed  number  of 
consecutive  cell  loss,  n  =  5,  in  Table  4  we  vary  the  burstiness  and  give  the  fast  simulation  estimates  of  yn 
and  Fn,  with  their  percentage  relative  error.  25600  “biased”  busy  cycles  were  used  to  get  these  estimates. 
For  all  values  of  a,  the  “biasing”  probability,  ps,  is  set  to  0.5.  Using  the  same  effort  (in  CPU  time),  only 
for  relatively  high  probabilities,  it  is  possible  to  obtain  meaningful  results  from  standard  simulation.  As 
expected,  the  empirical  results  in  Table  4  indicate  a  sharp  increase  in  the  consecutive-cell-loss  probability 
due  to  increased  burstiness. 
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5  CONCLUSIONS 

In  this  paper  wc  have  proposed  a  heuristic  importance  sampling  change  of  measure  to  efficiently  estimate 
the  probability  of  a  rare  consecutive-cell-loss  event  in  a  GI /GI /l/k  queue.  This  heuristic  makes  use  of 
the  optimal  change  of  measure  proposed  by  Parckh  and  Walrand  (1989)  to  accelerate  the  occurrence 
of  a  rare  full-buffer  event  in  an  asymptotically  stable  queue.  However,  further  '‘biasing”  is  necessary  to 
increase  the  probability  of  a  rare  consecutive-cell-loss  event  during  a  full-buffer  period.  Experimental 
results  demonstrate  the  validity  and  effectiveness  of  our  fast  simulation  method,  which  is  used  for  the 
evaluation  of  a  GI / D/l/k  queueing  model  of  the  Leaky  Bucket  algorithm. 
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Table  1  Estimates  of  7„  and  E(0n)  in  an  M/M/l/k  Queue 


In 

E(On) 

Fast  Sim. 

Anal. 

Fast  Sim. 

Anal. 

full- 

buffer 

9.45  xlO-4 
±  3.20% 

9.48  x  10-4 

4.66  x  10“3 
±  4.52% 

4.72  x  10~3 

n  =  1 

7.69  x  10-4 
±  3.16% 

7.58  x  10-4 

2.10  x  10-3 
±  4.20% 

2.10  xlO-3 

n  =  4 

1.58  x  10-4 
±  3.25% 

1.59  x  10-4 

1.81  x  10-4 
±  3.50% 

1.84  x  10-4 

n  =  8 

7.12  x  10-6 
±  3.21% 

7.15  x  10_s 

7.15  x  10-6 
±  3.21% 

7.19  xlO-8 

n  =  16 

1.09  xlO-8 
±  3.21% 

1.09  xlO-8 

1.09  x  10-8 
±  3.21% 

1.09  xlO-8 

n  =  32 

2.53  xl0-w 
±  3.21% 

2.54  x  10-M 

2.53  x  10_M 
±  3.21% 

2.54  xlO"1,1 

n  =  64 

1.36  xl0~25 
±  3.21% 

1.36  x  10-25 

1.36  x  10-25 
±  3.21% 

1.36  xlO"25 

Table  2  Estimates  of  -yn  and  E{On)  in  an  M / D/l/k  Queue 


7  n 

E(On) 

Std.  Sim. 

Fast  Sim. 

Std.  Sim. 

Fast  Sim. 

full- 

buffer 

1.00  xlO"2 
±  4.49% 

9.92  xlO-3 
±  2.15% 

4.87  xlO-2 
±  5.99% 

4.80  x  10-2 
±  3.21% 

n  —  1 

6.47  xlO-3 
±  4.76% 

6.38  xl0“3 
±  2.22% 

1.43  x  10-2 
±  5.91% 

1.40  xlO"2 
±  3.00% 

n  =  4 

8.20  xlO-5 
±  3.48% 

8.28x  10-5 
±  3.50% 

n  =  8 

9.68  x  10-9 
±  4.23% 

9.68  xlO-9 
±  4.23% 

n  =  12 

— 

2.15  xlO-13 
±  4.97% 

— 

2.15  xlO"13 
±  4.97% 

n  =  16 

— 

1.49  x  10~18 
±  5.56% 

— 

1.49  xlO-18 
±  5.56% 
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Table  3  Estimates  of  7„  and  Tn  in  a  TPD S/ D /1/k  Queue 


In 

Tn 

Std.  Sim. 

Fast  Sim. 

Std.  Sim. 

Fast  Sim. 

full- 

buffer 

6.14  x  10-3 
±  8.46% 

6.15  xlO-3 
±  0.87% 

1.29  x  10-3 
±  9.50% 

1.28  xlO-3 
±  1.86% 

n  =  1 

5.14x  10-3 
±  9.16% 

5.19  xlO-3 
±  0.87% 

1.01  xlO-3 
±  10.13% 

1.02  xlO-3 
±  1.82% 

n  —  2 

4.27  xlO-3 
±  9.97% 

4.36  x  10-3 
±  0.87% 

8.12  x  10-4 
±  10.90% 

8.18  xlO-4 
±  1.78% 

n  =  4 

2.82  x  10-3 
±  12.06% 

2.99  xlO-3 
±  0.89% 

4.96  xlO"4 
±  12.88% 

5.21  xlO-4 
±  1.73% 

n  =  8 

1.18  xlO“3 
±  17.91% 

1.31  x  10-3 
±  1.08% 

1.88  xlO-4 
±  18.37% 

2.10  x  10-4 
±  1.77% 

n  =  16 

2. 23xl0-'1 
±  1.49% 

3.41  x  10-5 
±  2.01% 

n  =  32 

— 

5.70x10-® 

±  2.20% 

— 

8.62  xlO-7 
±  2.57% 

n  =  64 

— 

2.76x  10_B 
±  3.51% 

— 

4.18  x  10_1U 
±  3.75% 
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Table  4  Estimates  of  ~/n  and  J~n  in  a  TPD S/ D/l/k  Queue 


In 

En 

Std.  Sim. 

Fast  Sim. 

Std.  Sim. 

Fast  Sim. 

a  =  0.05 

3.07xl0“2 
±  4.04% 

3.12  xlO-2 
±  1.50% 

2.20  x  10“3 
±  5.22% 

2.22  xl0“3 
±  2.83% 

a  =  0.10 

1.55  x  10“3 
±  14.46% 

1.53  x  10~3 
±  1.53% 

1.61  xlO-1 
±  17.54% 

1.53  x  10“4 
±  2.80% 

a  =  0.15 

6.72  x  10-5 
±  58.73% 

6.39  xl0“5 
±  1.55% 

9.48  x  10“® 
±  71.29% 

7.77x10“® 

±  2.73% 

a  =  0.20 

— 

2.18  x  10~6 
±  1.60% 

— 

3.13  x  10“7 
±  2.70% 

a  =  0.25 

6.09  xl0“8 
±  1.65% 

9.80  xl0“9 
±  3.18% 

a  =  0.30 

1.34  xlO"9 
±  1.72% 

2.44  xlO-10 
±  3.12% 

a  =  0.35 

- - 

2.26  xlO-11 
±  1.71% 

— 

4.58  xl0“12 
±  2.54% 

a  =  0.40 

— 

2.86  xl0“13 
±  1.79% 

— 

6.42  x  10“14 
±  3.03% 
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Abstract 

While  direct  cell-level  simulation  accurately  predicts  congestion  in  cell-switched  networks, 
excessive  run-times  are  often  required  to  obtain  significant  results.  Methods  of  Accelerated 
simulation  have  therefore  been  developed,  examples  of  which  include  the  cell  rate  technique 
(which  represents  the  discrete  cell-streams  as  continuous  fluids)  and  the  histogram  method 
(which  merges  the  multiplexed  streams  into  an  aggregate  cell-rate  histogram  and  performs 
independent  statistical  analysis  on  each  bin).  The  current  work  applies  both  these  techniques 
to  a  simple  ATM  multiplexer  and  explores  their  respective  advantages  and  drawbacks.  While 
the  cell-rate  method  provides  accurate  predictions  under  a  rapidly  varying  bit  rate,  the 
histogram  method  is  more  successful  under  quasi-static  conditions.  This  suggests  the 
possibility  of  a  hybrid  cell-rate/histogram  model  which  is  accurate  at  both  extremes. 

Keywords 

ATM  networks,  simulation  techniques,  statistical  analysis. 


1  INTRODUCTION 

The  recent  proliferation  of  cell-switched  communication  networks  has  led  to  increasingly 
complex  problems  in  their  design,  evaluation  and  management.  Such  problems,  many  of  which 
arise  from  congestion  as  virtual  channels  are  multiplexed,  lead  to  cell-losses  and  transmission¬ 
time  jitter,  the  latter  being  particularly  harmful  to  real-time  services  such  as  video. 
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Various  computer-aided  techniques  have  been  devised  for  the  analysis  of  networking 
problems  (see  Kurose  and  Mouftah  1988  and  Frost  et  al.  1988).  The  most  direct  approach  is 
cell-level  simulation,  in  which  network  components  are  directly  represented  within  the 
software,  and  cell  arrivals  and  transmissions  are  mimicked  by  pseudorandom  sequences.  Such 
simulations  are  highly  processor-intensive,  and  enormous  run-times  are  often  required  to 
simulate  relatively  short  periods  of  operation.  For  example,  the  Orwell  simulator  recently 
developed  at  Loughborough  University  (Parish  et  al.,  1994)  can  take  several  hours  to  simulate 
one  minute  of  real-time,  and  since  acceptable  loss  rates  are  of  the  order  of  10 9  (i.e.  1  lost  cell 
in  109),  several  weeks  may  be  required  to  obtain  statistically  significant  characterization. 

For  this  reason,  numerous  workers  have  investigated  accelerated  simulation  techniques, 
which  allow  run-times  to  be  reduced  without  major  loss  of  accuracy.  One  example  is  variance 
reduction  which  manipulates  the  statistical  properties  of  a  cell-level  model  in  order  to  reduce 
the  stochastic  variability  of  its  output,  thus  shortening  the  run-time  needed  to  obtain  statistical 
significance  (Frost  et  al,  1988).  However,  the  current  paper  concentrates  on  the  following 
recently-published  techniques: 

•  The  cell  rate  method,  developed  by  Pitts  et  al.  (1994  a,b)  at  Queen  Mary  &  Westfield 
College,  London,  represents  the  various  discrete  cell-streams  applied  to  the  input  buffer  of 
an  ATM  multiplexer  as  continuous  fluids,  whose  flow-rates  are  modulated  by  the  bulk- 
traffic  characteristics.  This  method  has  been  shown  to  produce  accurate  cell-loss  predictions 
in  the  burst-scale  where  the  aggregate  cell-rate  exceeds  the  channel  capacity. 

•  The  histogram  method,  introduced  by  Skelly  et  al.  (1993)  at  the  University  of  Columbia, 
NY,  converts  the  incoming  cell-streams  of  an  ATM  multiplexer  into  arrival-rate  histograms 
and  convolves  them  together  to  form  an  aggregate  histogram.  Statistical  queueing  analysis 
is  applied  separately  to  each  histogram  bin,  and  the  results  are  then  combined  as  a  weighted 
sum.  It  is  a  fundamental  assumption  of  this  model  that  the  system  is  in  statistical  quasi¬ 
equilibrium,  and  it  is  therefore  unsuitable  for  rapidly  varying  bit-rates. 

The  current  paper  applies  variants  both  these  techniques  to  a  simple  two-channel  multiplexer. 
The  predictions  are  compared  with  the  results  of  a  stochastic  cell-level  simulator  and  their 
respective  accuracies  and  run-times  are  contrasted. 


2  CELL-LEVEL  SIMULATION 

Before  the  accuracies  of  any  accelerated  simulation  techniques  could  be  tested,  it  was  first 
necessary  to  establish  a  cell-level  simulator  against  which  their  predictions  could  be  compared. 
Figure  1(a)  shows  a  schematic  diagram  of  the  ATM  multiplexer  modelled  in  the  software 
(which  was  written  in  Turbo-C  and  ran  upon  a  486-based  desktop  microcomputer).  Time  was 
quantised  into  cycles,  during  each  of  which  up  to  one  cell  could  arrive  on  each  input  channel 
and  up  to  one  cell  could  be  read  by  the  server.  The  latter  operated  in  a  geometric  mode,  in 
which  there  was  a  constant  probability  (p)  per  cycle  of  a  cell  being  read. 

Each  of  the  two  buffer  inputs  could  be  fed  with  any  user-defined  data-stream.  If  both 
channels  generated  a  cell  within  the  same  cycle  (i.e.  a  batch  arrival)  both  were  simultaneously 
loaded  onto  the  buffer  in  a  randomly  selected  order  (i.e.  each  cell  had  equal  probability  of 
getting  first  place  in  the  buffer). 
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(a)  Schematic  Diagram  of  ATM  Multiplexer  (b)  Simulated  Cell-Delay  Distributions 


Figure  1:  Cell-level  simulation  of  channel  interaction  in  an  ATM  multiplexer. 


Figure  1(c)  shows  some  typical  results  obtained  from  the  simulator  using  a  buffer-length 
(AO  of  20  and  a  geometric  service-probability  of  0.4  per  time-slot.  Firstly,  an  unmodulated 
Bernoulli  stream  (arrival  probability  ^=0.2)  was  sent  through  the  buffer  on  Channel  1  with 
Channel  2  inactive,  and  the  cell-delay  distribution  was  recorded.  The  experiment  was  then 
repeated  with  an  additional  2-state  Markov-modulated  Bernoulli  stream  applied  to  Channel  2 
(Figure  1(b)),  and  the  subsequent  deterioration  of  transmission  quality  (i.e.  increased  cell- 
delay)  is  clearly  visible  in  the  results  (Figure  1(c)).  The  upper  mode  in  the  cell-delay 
distribution  clearly  represents  the  burst  component,  where  the  aggregate  cell  arrival  rate 
exceeds  the  server  capacity  and  the  buffer  becomes  normally  full. 


3  HISTOGRAM  SIMULATION 


The  analysis  presented  in  this  section  assumes  that  the  buffer  is  in  statistical  equilibrium,  and 
hence  that  the  equilibrium  probabilities  no  ....  nw  remain  constant  with  time.  (n„  is  the 
probability  that  the  buffer  contains  n  cells). 


Statistical  Queueing  Analysis 

If  only  a  single  Bernoulli  stream  is  applied  then  the  buffer  can  be  modelled  as  a  discrete-time 
Geo/Geo/  1/N  queue,  the  solution  of  which  is  a  matter  of  simple  textbook  theory.  For  0  <  n 
<  N ,  the  equilibrium  probabilities  are  given  by: 


=  (1  ~  V)  ‘  Y” 


=  ^(i-0 

p(l-A) 


n 


1  -  A  .yff 


where  y 


(1) 
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Figure  2:  Simulated  behaviour  of  Geo/Geo/l/N  buffer  compared  with  analytical  model 
predictions.  (Discrete  points  indicate  simulations,  solid  lines  indicate  the  analytical  model.) 


while  for  n  =  N: 


n 


N 


(1  ~  V)  (1  ~ 

1  -  —  -yN 
4 


(2) 


If  an  arriving  cell  finds  n  {<N)  cells  ahead  of  it,  then  it  remains  in  the  buffer  until  the  latter 
has  been  read  (n+1)  times.  Hence  the  probability  that  queueing  delay  is  equal  to  k  cycles  is 
given  by 


N-  1 


P{k)  =  £  n„ 

n  =  0 


k-  1 
n 


(3) 


Substituting  Eqn.(I)  for  n„and  simplifying  yields 


P(k) 


n0  Pd-P)^1 


(1  -  Xf'k 


(4) 


Since  the  final  term  in  (assumed  zero  for  k<N+l)  is  a  truncated  binominal  series,  it  may  be 
replaced  by  the  incomplete  beta  function  IA(N ,  k  -  N).  The  expression  may  now  be  re-written: 


P(k)  =  II0  p 


I^N,k-N)\ 


(5) 


If  an  incoming  cell  finds  N  cells  already  in  the  buffer  then  the  latter  is  full  and  the  new  cell 
must  therefore  be  lost.  Hence  the  loss  probability  is  equal  to  and  may  therefore  be 
computed  using  Equation  2.  Figure  2  compares  the  analytical  cell-loss  and  cell-delay 
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(a)  Transmission-Rate  Profile 


Frame  Number 


Transmission  Probability 


(c)  Queueing  Delay  Profile 


Figure  3:  Example  of  single-channel  histogram  simulation. 


characteristics  with  the  results  of  cell-level  simulations.  (Numerical  values  for  the  incomplete 
beta  function  Ix  were  computed  using  the  algorithm  supplied  in  Numerical  Recipes  in  C  by 
Press  et  al.  (1988).) 


Single  Channel  Histogram  Simulation 

Figure  3(a)  shows  an  example  of  the  simulated  variable  bit  rate  (VBR)  video  profiles  used  in 
this  study.  (The  video  simulation  was  based  upon  the  output  of  an  experimental  VBR  codec 
during  the  compression  of  "head-and-shoulders"  image  sequences.  The  occasional  high  cell- 
rate  excursions  correspond  to  scene-changes  within  the  sequence,  while  the  smaller  variations 
indicate  activity  within  individual  scenes.)  The  duration  of  each  video  frame  was  28  276 
cycles,  over  which  the  arrival  probability  X  remained  constant.  (This  period  was  sufficiently 
long  for  the  assumption  of  statistical  equilibrium  to  be  approximately  valid). 

Figure  3(b)  shows  the  same  video  profile  expressed  as  a  50-bin  arrival-probability 
histogram.  Independent  statistical  queueing  analysis  was  performed  upon  each  bin  (for  a  buffer 
size  of  N=  20  and  service  rate  A.=0.3),  after  which  the  results  were  weighted  according  to  their 
relative  frequencies,  and  finally  summed  to  obtain  the  overall  delay  and  loss  characteristics. 
Figure  3(c)  shows  the  resulting  queue-delay  distribution  (solid  lines)  compared  with  a  cell- 
level  simulation  of  the  same  scenario  (discrete  points).  The  cell-loss  ratio  was  computed  as 
2.56%  by  the  histogram  model,  compared  to  2.14%  predicted  by  the  cell-level  simulation. 

Since  the  distributions  in  Figure  3(c)  are  practically  indistinguishable,  the  histogram  curve 
clearly  provides  a  highly  accurate  approximation  of  the  simulated  data.  As  the  histogram 
results  were  obtained  in  approximately  1%  of  the  cell-level  run-time  (i.e.  Ill  seconds 
compared  to  10  494  for  the  cell-level  simulation),  the  experiment  illustrates  the  potential  value 
of  the  histogram  method  as  a  means  of  reducing  simulation  run-time. 
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Figure  4:  Typical  results  of  two-channel  histogram  simulation. 


Two-Channel  Histogram  Simulation 

Since  the  primary  focus  of  this  paper  is  the  interaction  of  competing  virtual-channels  in  a 
common  buffer,  the  above  theory  must  be  extended  to  cover  the  effects  of  multiple  Bernoulli 
inputs.  If  the  cell  arrival  probabilities  for  channels  1  and  2  are  A.[  and  Xj  respectively  and  p„ 


is  the  probability  of  n  arrivals  per  cycle,  then: 

Po  =  (1  “  ^i)  (1  “  ^2)  1  (^1  +  ^2)  +  ^1  ^2 

(6) 

P\  =  ^1  (x  "  K)  +  X2(l  -  Xx)  =  Xx  +  X2  -  2 -Xx  X2 

(7) 

<N 

~r 

H 

(8) 

The  resultant  arrival  stream  is  an  example  of  a  batch  Bernoulli  process  (maximum  batch  size 
=  2)  whose  effects  upon  discrete-time  queueing  have  been  analytically  studied  by  Dafermos 
et  al.  (1971)  and  more  recently  by  Hashida  et  al.  (1991).  However,  the  current  analysis 
employs  the  following  simplifying  assumption:  If  and  Xn  are  both  «  1  then  XlX2  becomes 
negligible  and  the  aggregate  stream  approximates  to  a  standard  Bernoulli  process  with  arrival 
probability  (A,,  +^2). 

Figure  4  shows  some  typical  histogram  and  cell-level  results  for  the  interaction  of  two 
independent  VBR  streams  in  a  common  buffer.  The  latter  were  initially  converted  into 
individual  arrival-rate  histograms,  which  were  then  convolved  together  to  form  the  aggregate 
histogram.  Although  a  wide  divergence  exists  in  some  parts  of  the  graph,  the  general 
agreement  in  the  shapes  of  the  curves  illustrates  the  value  of  the  technique.  In  view  of  the 
accuracy  of  the  one-channel  case,  these  errors  are  almost  certainly  associated  with  the 
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(a)  Tima  Dependent  Queueing  Profiles 


(b)  Comparison  of  Histogram  and  Cell-Rate  Models 


100 


0 


Figure  5:  Illustration  of  cell-rate  model,  and  comparison  with  the  predictions  of  cell-level 
and  histogram  simulations. 


a£gregat*on  approximation  (Section  3)  and/or  the  histogram  convolution  technique  which  is, 
strictly  speaking,  applicable  only  to  stationary  stochastic  processes.  Extension  of  the  time- 
domain  would  therefore  be  expected  to  improve  prediction  accuracy. 


4  CELL-RATE  METHOD 


The  cell-rate  simulation  technique  described  below  is  a  simplified  version  of  that  published 
by  Pitts  et  al.  (1994  a,b).  Its  main  function  is  to  show  how  the  primary  properties  of  this 
algorithm  differ  from  those  of  the  histogram  method  described  above. 

Basically,  the  cell-rate  model  ignores  the  discrete  nature  of  the  cell  streams,  and  represents 
them  as  continuous  fluids  modulated  by  a,  burst  traffic  profile.  The  latter  is  composed  of 
constant  cell-rate  bursts,  punctuated  by  discontinuous  cell-rate  changes  known  as  events. 
Within  each  burst,  when  the  buffer  is  neither  full  nor  empty,  the  number  n(t)  of  cells  in  the 
queue  varies  with  time  t  (cycles)  according  to  the  equation 


n(t)  =  n0  + 


E 


x 


rv- 


0  <  n(t)  <  N. 


(9) 


where  t0  is  the  time  at  which  the  burst  began,  n0  is  the  number  of  queued  cells  when  t-t0  and 
k  is  the  number  of  multiplexed  streams.  This  transient  phase  ends  when  the  queue  becomes 
full  or  empty,  and  n  remains  equal  to  N  or  0  until  the  end  of  the  burst.  Figure  5(a)  shows 
these  transient  and  steady  state  phases  compared  with  the  corresponding  cell-level  simulations 
for  an  initially  empty  Geo/Geo/  1/N  queue.  When  the  buffer  is  full  and  the  aggregate  cell-rate 
exceeds  the  server  capacity,  losses  occur  at  a  rate  of  (LX  -  p)  cells  per  cycle,  and  are 
distributed  between  Channels  1  and  2  according  to  the  ratio 

Unlike  the  Pitts  et  al.  model  (which  handles  network  events  concurrently)  the  cell-rate 
simulator  program  tracks  buffer  occupancy  from  burst-to-burst  throughout  the  range  of  the 
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(a)  Time-Domain  Representation 


(b)  Probability  Distributions 


Figure  6:  Illustration  of  proposed  hybrid  cell-rate/histogram  model. 


simulation,  recording  the  number  of  lost  cells.  Figure  5(b)  shows  some  typical  cell-loss 
characteristics  compared  with  the  results  of  the  cell-level  and  histogram  models.  During  a  high 
arrival-rate  burst  when  ZA/p  >  1  and  cell-losses  become  significant,  the  length  of  the 
expanding  queue  is  constrained  by  the  buffer  capacity  and  the  transient  phase  can  be  expected 
to  be  of  the  same  order  as  the  buffer-filling  time,  i.e.  ttran  -  N  /  (LX  -  p).  Hence  the  observed 
increase  in  accuracy  with  increasing  N.  However,  when  N  is  small  and  thursl  » t,mn  ,  transient 
phenomena  can  be  entirely  neglected  and  the  queue  assumed  to  be  in  equilibrium  throughout 
the  simulation.  Hence  when  N  becomes  small,  the  histogram  model  provides  the  best 
predictions. 


5  CONCLUSIONS  AND  FUTURE  WORK 

This  paper  has  presented  some  early  results  from  an  ongoing  study  of  computer-aided 
communication-network  modelling.  The  initial  stage  of  the  work  involved  the  design  and 
testing  of  a  cell-level  simulator  for  a  single-server  FIFO  buffer  with  a  geometric  read-time 
distribution,  fed  by  two  independent  cell-streams.  (This  system  could  provide  a  module  in  a 
full  network  simulator).  The  same  system  was  also  modelled  using  the  cell-rate  technique  and 
a  histogram  model  based  upon  statistical  queueing  theory.  Although  both  these  models 
produced  results  within  significantly  shorter  run-times  than  the  cell-level  simulator,  they  were 
found  to  be  accurate  only  within  certain  regions  of  parameter-space.  We  now  consider  the 
possibility  of  a  hybrid  model,  combining  the  respective  virtues  of  these  two  algorithms. 

Such  an  algorithm  would  be  required  to  model  the  stochastic  nature  of  both  the  transient 
and  steady-state  conditions  of  operation.  Although  several  transient  models  are  available  for 
the  unbounded,  continuous-time  M/M/l/°°  queue  (a  computationally  efficient  formula  has 
recently  been  developed  at  Bradford  University  (Bundey,  1995)),  the  finite  capacity  of  the 
ATM  buffer  presents  severe  theoretical  difficulties.  One  possible  solution  is  illustrated  in 
Figure  6(a):  The  cell-rate  model  is  applied  during  all  transient  phases  of  operation,  while  the 
statistical  equilibrium  model  is  used  during  periods  of  statistical  equilibrium  (i.e.  when  the 
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value  of  n  predicted  by  the  cell-rate  model  is  either  0  on  N).  The  cell  delay  distribution  in 
a  transient  phases  might  be  represented  to  some  degree  of  accuracy  by  a  rectangular  function 
of  width  pTV  and  height  l/pA  (Figure  6(b)). 

It  should  be  noted  that  the  results  represent  only  the  most  preliminary  findings  of  an 
ongoing  study  of  network  modelling,  and  are  not  intended  to  form  a  definitive  treatment  of 
the  subject.  Investigations  have  so  far  been  confined  to  a  single  network  component,  consisting 
of  a  single  queue  and  a  single  server,  under  relatively  simple  traffic-loading  conditions 
(although  some  of  the  bulk  statistics  were  based  upon  a  realistic  VBR  video-source  model). 
The  model  must  ultimately  be  extended  to  cover  a  network  of  many  such  interconnected  units 
under  more  generalized  traffic,  which  may  include  such  complicating  effects  as  correlation  in 
the  cell-generation  process  (Skelly,  1994).  The  validity  of  the  resulting  model  must  then  be 
checked  by  comparing  its  predictions  against  the  operational  statistics  of  an  actual  hardware 
network  under  realistic  traffic-loading  conditions. 


6  ACKNOWLEDGEMENTS 

The  authors  wish  to  thank  all  members  of  the  High  Speed  Networks  research  group  at 
Loughborough  University  for  their  suggestions  and  technical  assistance.  The  work  was  funded 
by  an  EPSRC-ROPA  grant. 


7  REFERENCES 

Bunday,B.  (1995),  Dept,  of  Mathematics,  University  of  Bradford,  private  communication. 

Dafermos,S.C.,  Neuts,M.F.  (1971)  A  Single  Server  Queue  in  Discrete  Time,  Cahiers  du 
Centre  D'etude  de  Rechherche  Operationnelle,  19,  23-40. 

Frost, V.S.,  Wood  Larue,W,  Shanmugan,K.S.  (1988)  Efficient  Techniques  for  the  Simulation 
of  Computer  Communications  Networks,  IEEE  J-SAC,  6,  146-57. 

Hashida.O,  Takahashi.Y,  Shimogawa,S  (1991)  Switched  Batch  Bernoulli  Process  (SBBP)  and 
the  Discrete-Time  SBBP/G/1  Queue  with  Application  to  Statistical  Multiplexer  Performance, 
IEEE  J-SAC,  9,  394-401. 

Kurose.J.F.,  Mouftah,H.T.  (1988)  Computer-Aided  Modelling,  Analysis,  and  Design  of 
Communication  Networks,  IEEE  J-SAC,  6,  130-45. 

Parish, D.J.,  Rogers, C,  Nche,C,  Ruiz, I  (1994)  Modelling  the  Orwell  Network  Access  Protocol 
on  a  Slotted  Ring,  in  Computer  and  Telecommunication  Systems  Performance  Engineering 
(eds.  M.E.Woodward,  S.Datta,  S.Szumko),  Pentech  Press,  London,  108-13. 

Pitts, J.M.,  Cuthbert,L.G.,  Bocci,M.,  Scharf,E.M.  (1994a)  Cell  Rate  Modelling:  An  Accelerated 
Simulation  Technique  for  ATM  Networks,  ibid.,  94-107. 

Pitts, J.M.,  Cuthbert,L.G.,  Bocci,M,  Scharf,E.M.  (1994b)  An  Accelerated  Simulation  Technique 
for  Modelling  Burst-Scale  Queueing  Behaviour  in  ATM,  Teletraffic  Congress,  14th. 
International  Conference,  1,  777-86. 

Press,W.H,  Teukolsky,S.A.,  Vetterling,W.T.,  Flannery ,B.P.  (1988)  Numerical  Recipes  in  C, 
Cambridge  University  Press. 

Skelly,P,  Schwartz, M,  Dixit, S  (1993)  A  Histogram-Based  Model  for  Video  Traffic  Behaviour 
in  an  ATM  Multiplexer,  IEEE/ACM  Trans.  Networking.,  1(4),  446-59. 


440 


Part  Eight  Performance  Modelling  Studies 


8  BIOGRAPHIES 

Martin  J.  Tunnicliffe  holds  B.Eng.  and  Ph.D.  degrees  from  the  Universities  of  Bradford  and 
Loughborough  respectively.  His  early  research  was  in  the  field  of  semiconductor  reliability, 
but  he  has  more  recently  been  involved  in  the  monitoring  and  analysis  of  communication 
networks.  He  is  currently  employed  as  a  contract  researcher  in  the  High  Speed  Networks 
Group  at  Loughborough  University. 

David  J.  Parish  holds  B.Sc.  and  Ph.D.  degrees  from  the  University  of  Liverpool.  He  has 
worked  as  a  Scientific  Officer  at  the  UKAEA  Culham  Laboratory  and  as  a  Demonstrator  in 
the  Electrical  Engineering  Department  at  Liverpool  University.  From  1983  he  has  held  the 
position  of  Lecturer  and  later  Senior  Lecturer  in  the  Department  of  Electronic  and  Electrical 
Engineering  at  Loughborough  University.  His  research  interests  concern  the  management, 
operation,  monitoring  and  application  of  High  Performance  Networks.  Specifically,  he  leads 
Loughborough's  input  to  the  BT  funded  research  programme  into  the  management  of  high 
speed  networks  using  SuperJanet. 


INDEX  OF  CONTRIBUTORS 


Ajmone  Marsan,  M.  175 
Arvidsson,  A.  39 

Bhabuta,  M.  287 
Bianco,  A.  175 
Bose,  S.K.  22 

Casals,  O.  400 
Cerdit,  L.  400 
Chalasani,  S.  269 
Cigno,  R.L.  175 
Conti,  M.  3 

Dagiuklas,  A.  358 
De  Laet,  G.  342 

Fan,  Z.  74 
Feng,  Y.  233 
Fiche,  G.  381 

Garcia,  J.  400 
Gelenbe,  E.  233 
Ghanbari,  M.  358 
Gravey,  A.  57 
Gregori,  E.  3 
Griffiths,  J.M.  327 


Gustafsson,  E.  110 

Hagesteijn,  G.A.  414 
Halberstadt,  S.  57 
Harrison,  P.  287 
Hawker,  I.  133 
Hoeksema,  F.  92 

Karlsson,  G.  110 
Kofman,  D.  57 
Kokkinakis,  G.  153 
Kouvatsos,  D.  287 
Kroeze,  J.  92 

Le  Palud,  Cl.  381 
Lind,  C.  39 
Liotopoulos,  F.K.  269 
Logothetis,  M.  153 

Mang,  X.  233 
Mars,  P.  74 
Meyer,  J.F.  249 
Montagna,  S.  249 
Munafb,  M.  175 
Murphy,  J.  197 
Murphy,  L.  197 


Naudts,  J.  342 
Nicola,  V.F.  414 

Paglino,  R.  249 
Papanikos,  I.  153 
Parish,  D.J.  431 
Pitts,  J.M.  327 

Rao,  T.S.  22 
Rouillard,  S.  381 

Smith,  D.G.  133 
Srivathsan,  K.R.  22 

Truffet,  L.  215 
Tunnicliffe,  M.J.  431 
Tye,  B.J.  358 

Veitch,  PA.  133 

Wilkinson,  J.  287 
Witters,  J.  92 

Yin,  X.W.  342 


KEYWORD  INDEX 


ABR  175 
Accuracy  39 

Asymmetrical  Clos  networks  269 
Asynchronous  Transfer  Mode  (ATM) 
switch  architectures  287 
ATM  3,74,92,175,327,381 
cell  level  traffic  model  39 
network  performance  prediction  233 
networks  57,  110,  153,  197,  414,  431 
switch  249, 342 
switches  269 
traffic  simulation  342 
virtual  paths  133 

B-ISDN  92 
Bandwidth  327 
allocation  233,  342 
Banyan  network  287 
Blocking  381 
Broad-band  381 
Bursty  traffic  model  39 

Call 

acceptance  381 
admission  control  233 
CBR  92 
Cell 

delay  variation  381 
loss  414 

CLOS  network  381 
Composite  technique  381 
Compound  Poisson  Process  (CPP)  287 
Computer  communication  networks  269 
Congestion  control  197 
Connection  admission  control  342 
Connectionless  services  57 

Diffusion  model  233 

Discrete  time  Markovian  models  215 

Dynamic 

feedback  197 
routing  control  153 

Equivalent  capacity  110 


FIR  neural  networks  74 
GCRA  92 

Generalised  Exponential  (GE)  distribution  287 
Generic  cell  rate  algorithm  342 

Importance  sampling  414 
Instantaneous  bandwidth  available  22 

Lumpability  215 

Markov 
chain  3 

decision  processes  57 
modulated 

Bernoulli  Process  39 
Posson  Process  39 

Maximum  Entropy  (ME)  principle  22,  287 

Measurements  92 

MMBP  39 

MMPP  39 

MPEG  3 

Multi 

-path  routing  110 
-stage  interconnection  network  215 
Multirate  networks  269 
Multistage  Iterconnection  Network  (MIN)  287 

Narrow-Band  381 
Network  parameter  control  342 
Nonblocldng  operation  269 

On-off  source  22,  249 

Performance  381 
Policing  327 
Pricing  197 

Quality  of  Service  (QoS)  233,  414 
Queueing,  327 

Network  Model  (QNM)  287 
theory  233 

Rare  event  simulation  4 1 4 
Repetitive-Service  (RS)  blocking  mechanism  287 


444 


Keyword  index 


Restoration  133 
Routing  133 

Shared  buffers  249 
Simulation  92,  175 
techniques  43 1 
Statistical . 
analysis  431 
multiplexer  22 
multiplexing  3,381 
Strong  ordering  215 
Survivable  network  design  133 
Switching  327 
network  381 

TCP  175 


Throughput  analysis  92 
Traffic 

and  congestion  control  358 
control  110,  175 
dispersion  110 
management  57 
prediction  74 
shaping  175 

UPC  92 

Usage  parameter  control  342 

Variable  bit  rate  video  3 

Veinott’s  criterion  215 

Virtual  path  bandwidth  control  153 


DATE  DUE  /  DATE  DE  RETOUR 


1  ?fl01 

FEB  0 

tvU  1 

CARR  MCLEAN 


9  ZSSZVVO  179 


Ai  Sid3A  Nh  1N3H1 


ATM  Networks 

Performance  Modelling  and  Evaluation  Volume  2 

Edited  by  Demetres  Kouvatsos 

Unlike  many  books  on  Asynchronous  Transfer  Mode,  this  text  approaches  the  subject 
systematically  and  reflects  the  state-of-the-art  technology  being  applied  throughout 
the  world  today.  In  addition  it  provides  a  fundamental  source  ot  reference  in  the 
ATM  field. 

The  following  topics  are  discussed  in  detail: 

•  traffic  modelling  and  characterisation; 

•  traffic  and  congestion  control; 

•  bandwidth  and  admission  control; 

•  .  ATM  switch  architecture; 

•  models  of  ATM  switc  hes; 

V  routing  and  optimisation; 

•  quality  of  service; 

•  network  management; 

•  high  speed  LANs  and  MANs; 

•  performance  modelling  studies 

The  book  presents  expanded  research  papers  selected  from  the  Third  IFIP 
Workshop  on  Performance  Modelling  and  Evaluation  of  ATM  Networks,  sponsored 
by  the  International  Federation  for  Information  Processing  (IFIP),  and  held  July 
1995,  llkley,  UK.  It  is  ideal , for  personnel  in  computer/communication  industries, 
and  academic  and  research  statt  in  computer  science  and  electrical  engineering. 

Demetres  Kouvatsos  is  a  Reader  in  Computer  Systems  Modelling  at  the  University 
of  Bradford,  UK. 

Also  available  from  Chapman  &  Hall 

Performance  Modelling  and  Evaluation  of  ATM  Networks  Volume  1 

Edited  by  Demetres  D.  Kouvatsos 
Hardbac k»(0  4 1 2  71  140  0),  040  pages 

Information  Network  and  Data  Communication 

Edited  by  Finn  A.  Aagesen,  Harald  Botnevik  and  Dipak  Khakhar 
Hardbac  k  (0  412  75750  8i,  4b  I  pages 

Intelligent  Networks  and  New  Technologies 

Edited  by  Jorgan  Norgaard  and  Villy  B.  Iversen 
Hardback  (0  412  78900  0i  108  pages 

CHAPMAN  &  HALL 

London  •  Wemheim  •  Nov.  York  •  Tokyo  •  Melbourne  ••Madras 


